Big Blob of Compute Hypothesis
Dario Amodei‘s scaling thesis, written up in an internal doc in 2017 (when GPT-1 had just appeared) and still the lens he uses. The claim: ‘all the cleverness, all the techniques, all the “we need a new method” — that doesn’t matter very much.’ A handful of ingredients determine results.
The ingredients
Dario lists roughly seven:
- Raw compute — how much you have.
- Quantity of data.
- Quality and distribution of data — it must be a broad distribution.
- Training length — how long you train.
- An objective function that scales ‘to the moon’ — pre-training’s next-token objective is one; the RL objective (‘reach the goal’) is another, with objective rewards (maths, code) and subjective ones (RLHF and higher-order versions).
- and 7. Normalisation / conditioning — numerical stability so ‘the big blob of compute flows in this laminar way’ rather than hitting problems.
It was written as a general document, not about language models specifically — robotics, AlphaGo/Dota-style RL, and AlphaStar were all live at the time. Rich Sutton’s Bitter Lesson arrived ‘a couple of years later’ as a parallel, more famous formulation of the same idea.
RL is ‘just the same’
Dario treats RL-versus-pre-training as ‘a red herring’. The history rhymes: GPT-1 trained on narrow fanfiction data didn’t generalise; GPT-2 trained on a broad internet scrape did. RL is now repeating that arc — from narrow tasks (maths competitions, where performance is ‘log-linear in how long we’ve trained it’) to broad ones (code, then many others) — and ‘we’re seeing the same scaling in RL that we saw for pre-training’. The point of building RL environments is not to teach every skill but to reach generalisation, exactly as with pre-training data.
The sample-efficiency puzzle
The one thing the hypothesis doesn’t dissolve is why models need trillions of tokens when humans don’t. Dario’s resolution: pre-training and RL sit between human evolution and human lifetime-learning, and in-context learning sits between long- and short-term human learning — a hierarchy in which the LLM phases fall between the human points rather than matching any of them. Many apparent ‘barriers’ (syntax-not-semantics, ‘can’t reason’) have, he notes, historically ‘dissolved within the big blob of compute’.
See also
- Dario Amodei on the End of the Exponential, AI Diffusion, and the Economics of AGI — source episode
- Bitter Lesson — Sutton’s parallel, later formulation
- Scaling Laws — the measured curves the hypothesis predicts
- Country of Geniuses in a Data Center — where Dario thinks the climb leads
- Reinforcement Learning from Human Feedback