Archive
Evaluation & Benchmarks

The Open Reasoning Benchmark With Twenty Thousand Problems

Archive
Hard
200pts0 solves
An eval that continuously scrapes new LeetCode contest problems to prevent training contamination for coding evals is called what?
Show hint
A word that describes something happening now + the task class.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.