Document Understanding

Medium150 pts0 solves

A VLM reads a scanned invoice and understands that 'Total: $500' is a summary field, not a line item, based on its visual position. What does the VLM extract beyond raw text? Flag format: CONGRESS{extract_[what]} Example: CONGRESS{extract_text_from_pixels}

Hint

It's about understanding what the layout MEANS, not just reading the words.