Vision-Language Models

Very Easy50 pts0 solves
GPT-4V, Claude 3, and Gemini understand both images and text in one conversation. What two modalities do they process? Flag format: CONGRESS{[modality1]_and_[modality2]} Example: CONGRESS{audio_and_text}
Hint
The 'V' in GPT-4V stands for Vision.