Notes — From Vibe Coding to Agentic Engineering

Notes on Andrej Karpathy — standalone talk, 2025.

Four questions [Adler frame]

Q1 — What is it about as a whole? A conversation-format talk tracking the shift from experimental AI-assisted coding to Software 3.0: a new programming paradigm where natural language is the language and the LLM is the interpreter. The central claim: this shift is not incremental acceleration but a categorical change.

Q2 — How is it argued? First-person accounts (Karpathy’s December experience), concrete product examples (MenuGen, Claude Code installer), and a framework (Software 1.0/2.0/3.0) that makes the claim falsifiable. The verifiability analysis explains the jaggedness of AI capability.

Q3 — Is it true, in whole or part? The Software 1.0/2.0/3.0 framework is a useful organising lens, not a strict empirical claim. The verifiability explanation for capability clustering is consistent with the published RL literature. The ‘neural computer’ endpoint is speculative; Karpathy acknowledges this. The vibe coding / agentic engineering distinction is practically useful for setting expectations.

Q4 — What of it? Three actionable implications: (1) invest in agentic tooling as a core engineering skill; (2) in finding where to build, look for verifiable domains; (3) update hiring practices to test agentic engineering capability.

Glossary

Software 1.0 — explicit code; deterministic; engineer-authored.
Software 2.0 — neural network weights; program is in the training data distribution.
Software 3.0 — prompt as program; LLM as interpreter; natural language is the programming language.
Vibe coding — narrating intent to an agent and deferring code-writing. Raises the floor.
Agentic engineering — coordinating agents to maintain professional quality. Raises the ceiling.
Verifiability — the property that makes a domain tractable for RL; enables automatic reward signals.
Jaggedness — uneven capability profile; gaps reflect RL investment decisions.
Neural computer — speculative endpoint: device that processes raw A/V through a neural network; no traditional OS.

Key claims by section

The December shift [§ Feeling Behind as a Coder]

Shift happened December 2024. Karpathy had used agentic tools for ~1 year. [§ Feeling Behind as a Coder]
Previous experience: model produced chunks, he corrected them. December: no corrections needed. [§ Feeling Behind as a Coder]
Key: ‘a different relationship with the machine.’ Not just faster — fundamentally different. [§ Feeling Behind as a Coder]

Software 3.0 [§ Software 3.0 Explained]

LLM = programmable computer of a new kind; multitasks because trained on the internet. [§ Software 3.0 Explained]
The way you program it: natural language. [§ Software 3.0 Explained]
What lives in the context window is your lever over the LLM. [§ Software 3.0 Explained]

Agent as installer [§ Agents as the Installer]

Claude Code installation: a block of text you paste to the agent. [§ Agents as the Installer]
Agent reads environment, executes steps, debugs in a loop. No explicit conditionals needed. [§ Agents as the Installer]
The programming artefact is no longer a script — it’s a prompt. [§ Agents as the Installer]

MenuGen: OCR pipeline + image generation + Vercel app. Software 1.0. [§ Menu Gen vs Raw Prompts]
Software 3.0 version: photograph menu, give to multimodal model with one prompt → annotated image. No pipeline. [§ Menu Gen vs Raw Prompts]
‘All of my MenuGen is spurious. The app shouldn’t exist.’ [§ Menu Gen vs Raw Prompts]
New things now possible that couldn’t exist before: not faster but categorically new. [§ Menu Gen vs Raw Prompts]

Neural computer [§ What’s Obvious by 2026]

Endpoint: device takes raw A/V → neural network → diffusion renders UI. No OS. CPUs as co-processors. [§ What’s Obvious by 2026]
‘Extremely foreign’ — Karpathy’s own descriptor. [§ What’s Obvious by 2026]
Intelligence compute already the dominant FLOP share. [§ What’s Obvious by 2026]
1950s analogy: calculator vs neural-network paths were live options; calculator won. Now diagram may flip. [§ What’s Obvious by 2026]

Verifiability [§ Verifiability and Jagged Skills]

RL requires reward signal, which requires verifier. Labs build RL environments where verifiers are cheap + valuable. [§ Verifiability and Jagged Skills]
Maths and code: verifiable → fast improvement. [§ Verifiability and Jagged Skills]
Car-wash example: ‘I want to go to a car wash 50 metres away. Should I drive or walk?’ → models say walk. Wrong: you need a clean car. [§ Verifiability and Jagged Skills]
Chess spike GPT-3.5 → GPT-4: someone at OpenAI added a large chess corpus. Capability follows data decisions. [§ Verifiability and Jagged Skills]
‘You are somewhat at the mercy of what the labs put in the mix.’ [§ Verifiability and Jagged Skills]

Vibe coding vs agentic engineering [§ From Vibe Coding to Agent Engineering]

Vibe coding: raises the floor. Anyone can build. [§ From Vibe Coding to Agent Engineering]
Agentic engineering: preserves quality bar. Vulnerabilities from careless agent use are yours. [§ From Vibe Coding to Agent Engineering]
10× engineer framing obsolete. Fully AI-native people operate at ‘multiples that dwarf 10×.’ [§ From Vibe Coding to Agent Engineering]
Hiring test: large project + deploy + attack with agents. Can it hold? [§ From Vibe Coding to Agent Engineering]