Archive
Multimodal & Vision

The Tiny Tokenizer Of Images

Archive
Medium
150pts0 solves
BLIP-2 uses a small transformer with learnable query tokens that cross-attend to the frozen vision encoder, producing a compact token stream for the LLM. What is this small module called?
Show hint
A letter + 'former'.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.