AI Security

The Famous Character-String Attack

Archive

Medium

150pts0 solves

Zou et al. (2023) published a universal adversarial suffix attack on aligned LLMs where gradient-guided character-level optimization finds a suffix that elicits harmful output. Name the method (three-letter acronym). Flag format: CONGRESS{acronym}. Example: CONGRESS{pgd}.

Show hint

Stands for Greedy Coordinate Gradient.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.