Concept

Age of Research

conceptaiscalingresearchagi

Age of Research

Ilya Sutskever‘s periodisation of deep learning, and his claim that the field has re-entered an age of research after a decade in which scaling dominated.

The three eras

  • 2012–2020 — age of research. Ideas mattered; compute was the bottleneck. People ‘tinkered with stuff and tried to get interesting results’ — AlexNet, the transformer, the run-up to scaling.
  • 2020–2025 — age of scaling. Scaling Laws and GPT-3 produced ‘the scaling insight’. ‘Scaling’ became a single, powerful word that told everyone what to do: add compute and data to the pre-training recipe and reliably improve. It was low-risk capital allocation — far easier than asking researchers to ‘go forth and research’.
  • 2025+ — back to research, ‘just with big computers’. The recipe has been applied, the models are large, and 100× more of the same would not transform things. Pre-training data is finite and ‘will run out’. The remaining gains require new ideas, not more of the known recipe.

Why the shift matters

Scaling ‘sucked out all the air in the room’: because the recipe was known, everyone did the same thing, and the world ended up with ‘more companies than ideas by quite a bit’. In an age of research the directive is no longer clear, so the differentiator moves from compute back to insight — ‘if ideas are so cheap, how come no one’s having any ideas?’

Sutskever argues this also reopens the compute question: research does not need the largest cluster, only enough to convince. AlexNet ran on two GPUs; no transformer experiment used more than 64 GPUs of 2017 (‘two GPUs of today’). What scaling is has already shifted once — from pre-training to RL, which now consumes more compute at some labs — and he expects it to keep shifting as the field rediscovers ‘let’s try this and this and this’.

Where mainstream views differ

This cuts against the prevailing lab thesis that scaling the current paradigm still has years of predictable gains left, and that compute is the decisive moat. Scaling optimists point to continued returns from RL and inference-time compute, and to reasoning models as evidence the curve has not bent. Sutskever’s counter is not that scaling stops working but that it stops being transformative per added unit — the leverage returns to ideas. The claim is hard to falsify in advance and is made by someone (SSI, $3bn raised) whose strategy depends on it being true; he argues SSI’s research compute is comparable to rivals’ once their inference and product spend is netted out. Compare the adjacent but distinct argument of the Bitter Lesson, which holds that general methods leveraging computation keep winning.

See also