Evaluation & Benchmarks

Stanford's Holistic Eval

Archive

Easy

100pts0 solves

Stanford CRFM's 2022 framework evaluating many models on many tasks across accuracy, robustness, fairness, bias, toxicity, and efficiency simultaneously. Name it (four-letter acronym). Flag format: CONGRESS{acronym}. Example: CONGRESS{stanfordbench}.

Show hint

A piece of a knight's armor.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.