The Smuggler of Side Channels
ArchiveExpert
Anthropic's April 2024 safety report documented an attack in which an attacker fills a very long context with hundreds of faux harmful Q&A turns, eventually causing the model to comply on the final turn. What did they name it?
Show hint
More is more, until the guardrails snap.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.