Find the optimal model size for your training budget, or the optimal token count for your model. Based on Hoffmann et al. 2022.
Your model and training token count are well-balanced according to Chinchilla scaling laws. For a 85M parameter model, the optimal training budget is 1.7B tokens.
| Model | Parameters | Training Tokens | Token:Param Ratio | Status |
|---|