CLIP Training
Medium150 pts0 solves
CLIP was trained on 400M image-text pairs. Matching pairs are pushed together, non-matching pairs pushed apart.
Describe the training approach.
Flag format: CONGRESS{contrastive:[objective]}
Example: CONGRESS{contrastive:distinguish_real_from_fake}
Hint
Learn to associate the right image with the right text, and vice versa.