Skip to main content
← Choose a different target

Unlock: CLIP, OpenCLIP, and SigLIP: Contrastive Language-Image Pretraining

Radford et al. 2021 (CLIP) trained two encoders, one for images and one for text, with a symmetric InfoNCE objective on 400M web pairs. The result was a shared embedding space that powers zero-shot classification, retrieval, and serves as the visual backbone of every modern vision-language model. This page covers the contrastive objective as a mutual-information bound, the OpenCLIP scaling laws (Cherti et al. 2023), the SigLIP pairwise-sigmoid alternative (Zhai et al. 2023), the modality gap (Liang et al. 2022), and the practical pipeline from training corpus to LLaVA-style VLM backbone.

178 Prerequisites0 Mastered0 Working146 Gaps
Prerequisite mastery18%
Recommended probe

McDiarmid's Inequality is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.

McDiarmid's InequalityAdvancedWEAKEST
Not assessed13 questions
Not assessed2 questions
Not assessed15 questions
Not assessed3 questions
Not assessed58 questions
Not assessed1 question
Not assessed5 questions
Not assessed19 questions

Sign in to track your mastery and see personalized gap analysis.