Notes — Andrej Karpathy on AGI Timelines, RL’s Limits, and the Future of Education

Notes on Andrej Karpathy in conversation with Dwarkesh Patel — Dwarkesh Podcast, October 2025.

Four questions — Adler’s reading frame

Four questions from Mortimer Adler’s ‘How to Read a Book’: what is it about, how is it argued, is it true, and what of it?

Q1 — What is it about as a whole? Why current AI is impressive but not AGI, and why the gap closes over a decade rather than a year. Karpathy diagnoses the deficits — brittle reinforcement learning, model collapse, no continual learning, no machine culture — then argues the payoff diffuses gradually into the existing 2% GDP exponential rather than arriving as a discontinuity. Closes on his turn to education.

Q2 — How is it argued? By a practitioner’s intuition calibrated against fifteen years in the field, not by formal models. Karpathy reasons through concrete artefacts he has just built (nanochat, the 1989 LeCun reproduction), evocative analogies (‘sucking supervision through a straw’, the ‘march of nines’, ‘crappy evolution’), and repeated translation-in-time arguments (‘where were we ten years ago?’). Dwarkesh supplies the adversarial pressure — pushing the discrete-takeoff and Sutton-animal positions hard.

Q3 — Is it true, in whole or part? The deficits are well-grounded and consistent with what the four standalone talks describe (Jagged Intelligence, hazy-recollection memory, the gameability of Reinforcement Learning from Human Feedback). The decade timeline is explicitly intuition, not derivation — Karpathy says so. The ‘AGI blends into 2% growth’ thesis is the most contestable claim and is contested live: Dwarkesh’s Industrial-Revolution rebuttal (0.2%→2% was a regime change) is not fully answered. [?] The cognitive-core size estimate (~1B parameters in 20 years) is offered as a contrarian guess.

Q4 — What of it? For forecasting: the bottleneck is reliability on novel work, which improves a ‘nine’ at a time, so deployment lags demos by years. For builders: capability clusters where text and verification are cheap (code), so non-text and high-entropy domains stay hard. For everyone: Karpathy’s bet is that the human payoff is education — ‘pre-AGI education is useful, post-AGI education is fun.‘

Glossary

Decade of agents — Karpathy’s correction to ‘year of agents’: useful agents exist (Claude, Codex) but reaching employee-grade reliability is roughly a decade of work.
Ghosts / spirits — his name for LLMs: digital entities grown by imitating human internet text, as opposed to animals, grown by evolution with hardware baked in. Pre-training is ‘crappy evolution’ — the practically achievable substitute.
Cognitive core — the intelligence stripped of memorised knowledge: the algorithms for thought and problem-solving, kept while the encyclopaedic recall is removed so the model must look facts up.
Sucking supervision through a straw — the waste in outcome-based RL: a whole multi-minute trajectory is up- or down-weighted on a single final-reward bit, broadcast across every token.
Model collapse — the silent low-entropy failure of model samples: they occupy a tiny manifold (‘ChatGPT only knows three jokes’), so training on self-generated data degrades the model.
March of nines — reliability advances one nine at a time (90% → 99% → 99.9%), each nine a roughly constant quantum of work; the source of the demo-to-product gap.
Autonomy slider — the gradual handover of low-level work to automation while humans abstract upward (assembly → compilers → agents); not a discrete replacement.
Translation invariance in time — his forecasting heuristic: ask where the field was N years ago, expect comparable directional change N years out.
Eureka / Starfleet Academy — his education venture: an elite institution building ‘ramps to knowledge’ that maximise ‘eurekas per second.‘

Key claims by section

Timelines and the agent stack [§ AGI is still a decade away]

‘Decade of agents, not year of agents’ is a reaction to industry over-prediction. Agents are like an intern you would hire — and you don’t, because they lack intelligence, multimodality, computer use, and continual learning.
The decade figure is admitted intuition from ~15 years in AI, not a derivation. ‘The problems are tractable, surmountable, but still difficult.’
Historical missteps: Atari deep RL (2013) and OpenAI’s Universe (a keyboard-and-mouse web agent, his own project) went after agents too early — reward too sparse, no representational substrate. You must build the LLM/pre-training representations first, then tack agents on top.
We are building ghosts, not animals: imitation of internet documents, not evolution. The zebra runs minutes after birth — baked-in, not RL. Pre-training is ‘crappy evolution.’
Pre-training does two unrelated things: acquires knowledge and boots up intelligence (the circuits behind in-context learning). Karpathy wants to keep the second and strip the first — the cognitive core.
Knowledge in weights is ‘a hazy recollection’ (15T tokens compressed to a few billion parameters); tokens in the Context Window are directly-accessible ‘working memory’ (the KV cache). Give the model the full chapter, not its memory of the book.
‘If I can’t build it, I don’t understand it’ (Feynman). nanochat (~8,000 lines) is the simplest complete ChatGPT-clone pipeline; learn by rebuilding from scratch, reference-but-don’t-copy-paste.
Reproducing LeCun’s 1989 convnet: 33 years of algorithms alone halved the error; further gains needed 10× data and more compute. ‘Everything plus 20%’ — no single factor dominates progress. In 10 years, still giant nets trained by gradient descent, but bigger and tweaked.

What the models can’t do [§ LLM cognitive deficits]

Three coding modes: reject LLMs / autocomplete (his sweet spot — high-bandwidth, point and type) / vibe coding via agents.
Agents excel at boilerplate and patterns common on the internet; they fail on unique, ‘intellectually intense’ code. On nanochat they forced the PyTorch DDP container he had deliberately replaced, added over-defensive try/catch, used deprecated APIs, and bloated the codebase. ‘It’s slop… not net useful.’
The crux for timelines: models ‘are not very good at code that has never been written before’ — exactly what frontier research demands. This undercuts the fast AI-automates-AI-research takeoff story.
He treats GPT-5 Pro as ‘the oracle’ — paste the whole repo for a hard 20-minute question. The industry overstates the present (‘maybe trying to fundraise’).
AI is a continuum of computing automation (syntax highlighting, type checkers, search ranking are all ‘AI’); the autonomy slider raises the human’s level of abstraction rather than replacing the human at a stroke.

Why RL is the wrong tool, barely [§ RL is terrible]

‘Reinforcement learning is terrible. It just so happens that everything we had before it is much worse.’
Mechanism of outcome-based RL: run hundreds of rollouts, check the final answer, then up-weight every token of the winning trajectories — including the wrong turns taken before stumbling onto the answer. High-variance, noisy, ‘stupid and crazy.’ A human would neither do hundreds of rollouts nor credit every step equally.
Humans instead review — which parts went well, which badly. LLMs have no equivalent.
InstructGPT was the mind-blowing result: fine-tune an autocomplete base model on conversation-shaped text and it becomes a conversational assistant while keeping pre-training knowledge.
Process-based supervision is hard because partial-credit assignment is hard, and LLM judges are gameable: RL finds adversarial examples like ‘dhdhdhdh’ that the judge scores 100%. You can patch each, but they are infinite — so you can run ~10–20 steps, not hundreds. Needs ‘three or four or five more’ major algorithmic ideas.
See Reinforcement Learning from Human Feedback — the same reward-hacking failure he details in Deep Dive into LLMs like ChatGPT.

The human learning gap [§ How do humans learn?]

Reading a book: for a human it is a set of prompts for synthetic-data generation (you reconcile it with what you know, discuss it); for an LLM it is flat next-token prediction. We lack a pre-training stage that ‘thinks through’ material.
Model collapse: samples occupy a tiny manifold and are silently low-entropy — ask for a joke ten times, get the same three. Train on your own samples too long and you degrade. Humans collapse too over a lifetime; children are ‘not yet collapsed’, which is why they say shocking things. Entropy must be sought — talking to people is a source.
Poor human memorisation is a feature: it forces generalisation, ‘seeing the forest for the trees.’ LLMs memorise too well and are distracted by it — hence the cognitive core.
Internet pre-training data is ‘terrible’ (stock tickers, slop), so big models are needed mostly for memory work; refine the dataset and the model can shrink. Contrarian guess: a ~1B-parameter cognitive core in 20 years could hold a productive conversation and look facts up.
Frontier models grew then shrank: pre-training flops are not the best spend, so labs cut them and make it up in RL and mid-training. ‘Everything plus 20%’; datasets will improve most.

Why the payoff diffuses [§ AGI will blend into 2% GDP growth]

Rejects a single AGI progress axis (education level, [horizon length]); sees AI as an extension of computing. Original OpenAI definition: any economically valuable task at human level. The standard concession to knowledge work alone already drops most of the economy (~10–20%, still trillions).
Geoff Hinton’s radiologist prediction was wrong — the job is messy and is growing. Call-centre work is more automatable (short task horizon, low context, purely digital). Expect the autonomy slider: AIs handle ~80% of volume, humans supervise teams of five.
Coding is the perfect first domain: built around text, abundant data, pre-built infrastructure (IDEs, diffs). Slides and other domains lack this. But even pure language-in/out tasks resist value — transcript editing, and Andy Matuschak’s spaced-repetition cards (‘50 billion things’ tried). Code is structured; prose is high-entropy.
The GDP thesis: you cannot ‘find’ computers, mobile, or the iPhone in the GDP curve — diffusion is slow and averages into the same ~2% exponential. Recursive self-improvement has run for centuries (Industrial Revolution, compilers, search). ‘A firecracker event seen in slow motion.’
Contested live [?]: Dwarkesh argues true AGI is labour itself and should trigger a regime change like 0.2%→2%. Karpathy is ‘suspicious’ of a ‘God in a box’ discrete jump with no historical precedent and expects gradual diffusion instead. The rebuttal is not fully resolved.

Superintelligence as foreign automation [§ ASI]

ASI is the extrapolation of automation and will look ‘extremely foreign.’
Most likely outcome: gradual loss of both control and understanding as systems layer up and fewer people grasp them. His sci-fi image is not one entity but multiple competing autonomous entities — some go rogue, others fight them off, ‘a hot pot of autonomous activity.’
Even when entities act on behalf of individuals, society may lose control of the outcomes it wants.

Rarity, niches, and machine culture [§ Evolution of intelligence & culture]

Following Nick Lane: intelligence is evolutionarily recent and surprising; the bacteria→eukaryote step was bottlenecked for ~2 billion years. He would have bet on bigger muscles over intelligence.
Intelligence may have arisen several times (hominid vs bird brains are structurally distinct). Gwern/Carl Shulman niche argument: you need a scalable brain algorithm and a niche that rewards marginal intelligence (hands, tool use, externalised digestion); a bird with a bigger brain just falls out of the air. Unpredictable environments that can’t be baked into the genome incentivise test-time adaptability.
LLMs lack culture and self-play: no scratchpad they edit as they work, no LLM writing a book for other LLMs, no AlphaGo-style competition generating problems for each other. Multi-agent systems, culture, and organisations are powerful, unclaimed ideas.
Current models ‘feel like a kindergarten or elementary-school student’ — savant kids with perfect memory who pass PhD quizzes but are cognitively immature, so they cannot yet create culture.

The demo-to-product gap [§ Why self driving took so long]

Self-driving is ‘not even near done.’ Demos date to CMU 1986; he had a flawless Waymo ride in 2014 — and it still took a decade-plus.
The demo-to-product gap: demos are easy, products hard, especially where the cost of failure is high. Production software shares this property (a bug is a security breach); vibe coding does not.
The march of nines: each nine of reliability is a roughly constant quantum of work. Five years at Tesla bought ~2–3 nines. ‘I’m very unimpressed by demos.’
Waymo is still uneconomical and quietly teleoperated — humans moved out of sight, not removed. He judges Tesla’s vision-led approach more scalable.
Bits beat atoms by ~1,000,000× for adaptation speed, but knowledge work at scale still faces latency, legal, insurance, and societal layers (‘what is the equivalent of a cone on a Waymo?’).
The compute build-out echoes the 1990s telecom/railroad over-build that pre-paved the internet. He sounds pessimistic only against fundraising-driven hype; he is bullish and does not think compute is over-built — demand (Claude Code, Codex) barely existed a year ago.

Education as the human stake [§ Future of education]

Building Eureka / Starfleet Academy — an elite institution for technical knowledge. The fear is a WALL-E / Idiocracy future where humanity is sidelined; he wants humans to flourish, and judges he can add more unique value here than in incremental frontier-lab work.
The Korean one-on-one tutor is his benchmark: she modelled what he knew and didn’t from a short conversation and kept him ‘appropriately challenged’ so that ‘I was the only constraint to learning.’ No LLM does this now; the bar is too high, so it is not yet the time to build the AI tutor.
His value as an AI consultant was often telling firms not to use AI; the same holds in education today. He is building something more conventional first (physical + digital), with nanochat as the capstone of his LLM101N course.
‘Pre-AGI education is useful. Post-AGI education is fun’ — like the gym, which we attend though machines do the lifting. A good-enough tutor makes learning trivial and desirable; ‘anyone will speak five languages because why not.’ He bets on the timelessness of human nature (aristocrats, ancient Greece flourished cognitively).
Teaching method, from a physics training: find first-order terms, ‘assume a spherical cow’, recommend the book Scale. micrograd’s 100 lines are the essence of backpropagation — ‘everything else is efficiency.’ Present the pain before the solution; make the student guess first to ‘maximise knowledge per new fact added’; start the transformer from a bigram lookup table. Beware the curse of expertise; ‘just say the thing’ — the lunch-table explanation beats the jargon-filled abstract. Alternate depth-wise (on-demand, project-rewarded) and breadth-wise learning, and teach others to find the gaps in your own understanding.

Andrej Karpathy on AGI Timelines, RL's Limits, and the Future of Education