The Eval Suite By Anthropic For Red Teams
ArchiveExpert
Anthropic's 2024 interpretability paper trained a specific unsupervised method to extract millions of human-interpretable features from Claude 3 Sonnet's residual stream. Flag format: CONGRESS{acronym or full}. Example: CONGRESS{dictionary}.
Show hint
Three letters or three words.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.