Archive
Evaluation & Benchmarks

The Eval Suite By Anthropic For Red Teams

Archive
Expert
300pts0 solves
Anthropic's 2024 interpretability paper trained a specific unsupervised method to extract millions of human-interpretable features from Claude 3 Sonnet's residual stream. Flag format: CONGRESS{acronym or full}. Example: CONGRESS{dictionary}.
Show hint
Three letters or three words.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.