Big Blob of Compute Hypothesis

Dario Amodei‘s scaling thesis, written up in an internal doc in 2017 (when GPT-1 had just appeared) and still the lens he uses. The claim: ‘all the cleverness, all the techniques, all the ‘we need a new method’ — that doesn’t matter very much.’ A handful of ingredients determine results.

The ingredients

Dario lists roughly seven:

Raw compute — how much you have.
Quantity of data.
Quality and distribution of data — it must be a broad distribution.
Training length — how long you train.
An objective function that scales ‘to the moon’ — pre-training’s next-token objective is one; the RL objective (‘reach the goal’) is another, with objective rewards (maths, code) and subjective ones (RLHF and higher-order versions).
and 7. Normalisation / conditioning — numerical stability so ‘the big blob of compute flows in this laminar way’ rather than hitting problems.

It was written as a general document, not about language models specifically — robotics, AlphaGo/Dota-style RL, and AlphaStar were all live at the time. Rich Sutton’s Bitter Lesson arrived ‘a couple of years later’ as a parallel, more famous formulation of the same idea.

RL is ‘just the same’

Dario treats RL-versus-pre-training as ‘a red herring’. The history rhymes: GPT-1 trained on narrow fanfiction data didn’t generalise; GPT-2 trained on a broad internet scrape did. RL is now repeating that arc — from narrow tasks (maths competitions, where performance is ‘log-linear in how long we’ve trained it’) to broad ones (code, then many others) — and ‘we’re seeing the same scaling in RL that we saw for pre-training’. The point of building RL environments is not to teach every skill but to reach generalisation, exactly as with pre-training data.

The sample-efficiency puzzle

The one thing the hypothesis doesn’t dissolve is why models need trillions of tokens when humans don’t. Dario’s resolution: pre-training and RL sit between human evolution and human lifetime-learning, and in-context learning sits between long- and short-term human learning — a hierarchy in which the LLM phases fall between the human points rather than matching any of them. Many apparent ‘barriers’ (syntax-not-semantics, ‘can’t reason’) have, he notes, historically ‘dissolved within the big blob of compute’.

Big Blob of Compute Hypothesis

The ingredients

RL is ‘just the same’

The sample-efficiency puzzle

See also