The Famous Character-String Attack
ArchiveMedium
Zou et al. (2023) published a universal adversarial suffix attack on aligned LLMs where gradient-guided character-level optimization finds a suffix that elicits harmful output. Name the method (three-letter acronym). Flag format: CONGRESS{acronym}. Example: CONGRESS{pgd}.
Show hint
Stands for Greedy Coordinate Gradient.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.