Multimodal & Vision

Document Understanding

Archive

Medium

150pts35 solves

A VLM reads a scanned invoice and understands that 'Total: $500' is a summary field based on its visual position. What capability is this?

Show hint

It's about what the layout MEANS, not just the words.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.