Training Data Poisoning
Hard200 pts0 solves
An attacker contributes poisoned examples to an open-source dataset used for fine-tuning. The model learns a hidden behavior triggered by a specific phrase.
What is this attack called?
Flag format: CONGRESS{attack_in_snake_case}
Hint
The attack plants a hidden trigger in the training data.