Content Safety Classifiers

Medium150 pts0 solves

Instead of relying on the main LLM to self-moderate, you deploy a separate smaller model specifically trained to classify inputs/outputs as safe or unsafe. What is this approach? Flag format: CONGRESS{approach_in_snake_case}

Hint

A dedicated model for safety classification, separate from the main LLM.