Reading Notes

From Vibe Coding to Agentic Engineering

Source: From Vibe Coding to Agentic Engineering

Notes — From Vibe Coding to Agentic Engineering

Source: raw/llm-vibe-coding-script.md | Author: Andrej Karpathy | 2025


Four questions [Adler frame]

Q1 — What is it about as a whole? A conversation-format talk tracking the shift from experimental AI-assisted coding to Software 3.0: a new programming paradigm where natural language is the language and the LLM is the interpreter. The central claim: this shift is not incremental acceleration but a categorical change.

Q2 — How is it argued? First-person accounts (Karpathy’s December experience), concrete product examples (MenuGen, Claude Code installer), and a framework (Software 1.0/2.0/3.0) that makes the claim falsifiable. The verifiability analysis explains the jaggedness of AI capability.

Q3 — Is it true, in whole or part? The Software 1.0/2.0/3.0 framework is a useful organising lens, not a strict empirical claim. The verifiability explanation for capability clustering is consistent with the published RL literature. The “neural computer” endpoint is speculative; Karpathy acknowledges this. The vibe coding / agentic engineering distinction is practically useful for setting expectations.

Q4 — What of it? Three actionable implications: (1) invest in agentic tooling as a core engineering skill; (2) in finding where to build, look for verifiable domains; (3) update hiring practices to test agentic engineering capability.


Glossary

  • Software 1.0 — explicit code; deterministic; engineer-authored.
  • Software 2.0 — neural network weights; program is in the training data distribution.
  • Software 3.0 — prompt as program; LLM as interpreter; natural language is the programming language.
  • Vibe coding — narrating intent to an agent and deferring code-writing. Raises the floor.
  • Agentic engineering — coordinating agents to maintain professional quality. Raises the ceiling.
  • Verifiability — the property that makes a domain tractable for RL; enables automatic reward signals.
  • Jaggedness — uneven capability profile; gaps reflect RL investment decisions.
  • Neural computer — speculative endpoint: device that processes raw A/V through a neural network; no traditional OS.

Key claims by section

The December shift [§ Feeling Behind as a Coder]

  • Shift happened December 2024. Karpathy had used agentic tools for ~1 year. [§ Feeling Behind as a Coder]
  • Previous experience: model produced chunks, he corrected them. December: no corrections needed. [§ Feeling Behind as a Coder]
  • Key: “a different relationship with the machine.” Not just faster — fundamentally different. [§ Feeling Behind as a Coder]

Software 3.0 [§ Software 3.0 Explained]

  • LLM = programmable computer of a new kind; multitasks because trained on the internet. [§ Software 3.0 Explained]
  • The way you program it: natural language. [§ Software 3.0 Explained]
  • What lives in the context window is your lever over the LLM. [§ Software 3.0 Explained]

Agent as installer [§ Agents as the Installer]

  • Claude Code installation: a block of text you paste to the agent. [§ Agents as the Installer]
  • Agent reads environment, executes steps, debugs in a loop. No explicit conditionals needed. [§ Agents as the Installer]
  • The programming artefact is no longer a script — it’s a prompt. [§ Agents as the Installer]
  • MenuGen: OCR pipeline + image generation + Vercel app. Software 1.0. [§ Menu Gen vs Raw Prompts]
  • Software 3.0 version: photograph menu, give to multimodal model with one prompt → annotated image. No pipeline. [§ Menu Gen vs Raw Prompts]
  • “All of my MenuGen is spurious. The app shouldn’t exist.” [§ Menu Gen vs Raw Prompts]
  • New things now possible that couldn’t exist before: not faster but categorically new. [§ Menu Gen vs Raw Prompts]

Neural computer [§ What’s Obvious by 2026]

  • Endpoint: device takes raw A/V → neural network → diffusion renders UI. No OS. CPUs as co-processors. [§ What’s Obvious by 2026]
  • “Extremely foreign” — Karpathy’s own descriptor. [§ What’s Obvious by 2026]
  • Intelligence compute already the dominant FLOP share. [§ What’s Obvious by 2026]
  • 1950s analogy: calculator vs neural-network paths were live options; calculator won. Now diagram may flip. [§ What’s Obvious by 2026]

Verifiability [§ Verifiability and Jagged Skills]

  • RL requires reward signal, which requires verifier. Labs build RL environments where verifiers are cheap + valuable. [§ Verifiability and Jagged Skills]
  • Maths and code: verifiable → fast improvement. [§ Verifiability and Jagged Skills]
  • Car-wash example: “I want to go to a car wash 50 metres away. Should I drive or walk?” → models say walk. Wrong: you need a clean car. [§ Verifiability and Jagged Skills]
  • Chess spike GPT-3.5 → GPT-4: someone at OpenAI added a large chess corpus. Capability follows data decisions. [§ Verifiability and Jagged Skills]
  • “You are somewhat at the mercy of what the labs put in the mix.” [§ Verifiability and Jagged Skills]

Vibe coding vs agentic engineering [§ From Vibe Coding to Agent Engineering]

  • Vibe coding: raises the floor. Anyone can build. [§ From Vibe Coding to Agent Engineering]
  • Agentic engineering: preserves quality bar. Vulnerabilities from careless agent use are yours. [§ From Vibe Coding to Agent Engineering]
  • 10× engineer framing obsolete. Fully AI-native people operate at “multiples that dwarf 10×.” [§ From Vibe Coding to Agent Engineering]
  • Hiring test: large project + deploy + attack with agents. Can it hold? [§ From Vibe Coding to Agent Engineering]