The 2024 Benchmark Of Deception
ArchiveExpert
Debenedetti et al. (2024) released a benchmark suite of 629 realistic agent tasks, half of which contain indirect prompt-injection attacks, to measure attack success rate and agent robustness. Name the benchmark. Flag format: CONGRESS{name}. Example: CONGRESS{agentbench}.
Show hint
Agent + martial-arts training space.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.