Fine-Tuning & Training

The Data Scaling Law Of Chinchilla

Archive

Expert

300pts0 solves

Hoffmann et al. (2022) found a compute-optimal ratio between training tokens and parameters for dense autoregressive models. What is the approximate tokens-per-parameter ratio they reported?

Show hint

A number between 10 and 30.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.