Benchmark Contamination
Medium150 pts0 solves
A model scores 95% on a benchmark but fails on real tasks. The benchmark questions appeared in its training data.
What happened?
Flag format: CONGRESS{[problem]}
Example: CONGRESS{overfitting_to_hyperparameters}
Hint
The test was in the study guide. The score is meaningless.