Vision-Language Models
Very Easy50 pts0 solves
GPT-4V, Claude 3, and Gemini understand both images and text in one conversation.
What two modalities do they process?
Flag format: CONGRESS{[modality1]_and_[modality2]}
Example: CONGRESS{audio_and_text}
Hint
The 'V' in GPT-4V stands for Vision.