Multimodal & Vision

CLIP Training

Archive

Medium

150pts32 solves

CLIP was trained on 400M pairs using contrastive learning: push matching _____(1)-_____(2) pairs together, push non-matching pairs apart. Flag format: CONGRESS{1:[modality],2:[modality]} Example: CONGRESS{1:audio,2:transcript}

Show hint

Learn to associate the right picture with the right description.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.