CLIP Training

Medium150 pts0 solves

CLIP was trained on 400M image-text pairs. Matching pairs are pushed together, non-matching pairs pushed apart. Describe the training approach. Flag format: CONGRESS{contrastive:[objective]} Example: CONGRESS{contrastive:distinguish_real_from_fake}

Hint

Learn to associate the right image with the right text, and vice versa.