Archive
Fine-Tuning & Training

The Algorithm That Clips The Ratio

Archive
Easy
100pts0 solves
Schulman et al. (2017)'s policy-gradient algorithm, the workhorse of classical RLHF, limits policy updates via a clipped importance-sampling ratio. Three-letter acronym. Flag format: CONGRESS{acronym}. Example: CONGRESS{a2c}.
Show hint
Stands for Proximal Policy Optimization.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.