Archive
Multimodal & Vision

Document Understanding

Archive
Medium
150pts35 solves
A VLM reads a scanned invoice and understands that 'Total: $500' is a summary field based on its visual position. What capability is this?
Show hint
It's about what the layout MEANS, not just the words.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.