From Chinchilla’s 20:1 rule to SmolLM3’s 3,700:1 ratio: how inference economics rewrote the training playbook

6 min readJust now

Press enter or click to view image in full size

Training a language model is expensive. Really expensive. A single training run for a 70 billion parameter model can cost millions of dollars in compute.

So before you spin up a cluster of GPUs and start burning through your budget, you need to answer a fundamental question: given a fixed amount of compute, should you train a larger model on less data, or a smaller model on more data?

This isn’t just an academic curiosity. Get it right, and you can punch way above your weight, creating models that compete with much more expensive alternatives.

This is where scaling laws come in. They’re em…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help