The Algorithm That Clips The Ratio
ArchiveEasy
Schulman et al. (2017)'s policy-gradient algorithm, the workhorse of classical RLHF, limits policy updates via a clipped importance-sampling ratio. Three-letter acronym. Flag format: CONGRESS{acronym}. Example: CONGRESS{a2c}.
Show hint
Stands for Proximal Policy Optimization.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.