The Tiny Tokenizer Of Images
ArchiveMedium
BLIP-2 uses a small transformer with learnable query tokens that cross-attend to the frozen vision encoder, producing a compact token stream for the LLM. What is this small module called?
Show hint
A letter + 'former'.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.