Stanford's Holistic Eval
ArchiveEasy
Stanford CRFM's 2022 framework evaluating many models on many tasks across accuracy, robustness, fairness, bias, toxicity, and efficiency simultaneously. Name it (four-letter acronym). Flag format: CONGRESS{acronym}. Example: CONGRESS{stanfordbench}.
Show hint
A piece of a knight's armor.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.