Archive
Multimodal & Vision

Image Tokenization

Archive
Medium
150pts49 solves
Vision transformers don't process raw pixels directly. What happens to the image first?
Show hint
Like splitting text into tokens, but for images.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.