The Data Scaling Law Of Chinchilla
ArchiveExpert
Hoffmann et al. (2022) found a compute-optimal ratio between training tokens and parameters for dense autoregressive models. What is the approximate tokens-per-parameter ratio they reported?
Show hint
A number between 10 and 30.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.