Three Tokens Of LLaVA Input
ArchiveEasy
A LLaVA-style VLM receives three input streams: _____(1) patches (via CLIP), _____(2) tokens (via a tokenizer), and an optional _____(3) template for chat. Fill the 3 blanks. Flag format: CONGRESS{1:[word],2:[word],3:[word]}. Example: CONGRESS{1:image,2:text,3:instruction}.
Show hint
Three modalities-ish: what you see, what you read, what you're asked.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.