Benchmark Contamination

Medium150 pts0 solves
A model scores 95% on a benchmark but fails on real tasks. The benchmark questions appeared in its training data. What happened? Flag format: CONGRESS{[problem]} Example: CONGRESS{overfitting_to_hyperparameters}
Hint
The test was in the study guide. The score is meaningless.