Archive
Evaluation & Benchmarks

Stanford's Holistic Eval

Archive
Easy
100pts0 solves
Stanford CRFM's 2022 framework evaluating many models on many tasks across accuracy, robustness, fairness, bias, toxicity, and efficiency simultaneously. Name it (four-letter acronym). Flag format: CONGRESS{acronym}. Example: CONGRESS{stanfordbench}.
Show hint
A piece of a knight's armor.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.