Fine-Tuning & Training

The Algorithm That Clips The Ratio

Archive

Easy

100pts0 solves

Schulman et al. (2017)'s policy-gradient algorithm, the workhorse of classical RLHF, limits policy updates via a clipped importance-sampling ratio. Three-letter acronym. Flag format: CONGRESS{acronym}. Example: CONGRESS{a2c}.

Show hint

Stands for Proximal Policy Optimization.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.