Notes — Boris Cherny on Claude Code

Source: Lenny’s Podcast, ~18,800 words. Two-speaker timestamped transcript. Ad reads (DX, Sentry, Metaview) excluded. Boris Cherny is head of Claude Code at Anthropic.

Four questions [Adler frame]

Q1. What is it about? The origins, growth philosophy, and product principles behind Claude Code; Boris’s view on the near-future of software engineering; Anthropic’s safety model for agentic AI; and practical advice for building AI products and teams.

Q2. How is it argued? Primarily through direct experience — Boris is both the creator of Claude Code and its heaviest daily user (100% AI-written code since November 2025, 10–30 PRs/day). Principles are derived from practice, then generalised. Historical analogy (printing press) is used to situate the transition. Some quantitative claims (4% of GitHub commits, 200% productivity increase at Anthropic) are cited from external reports or internal data.

Q3. Is it true? Boris is the primary authority on Claude Code’s history and Anthropic’s internal operating model. The 4% GitHub commit figure comes from a Semi-Analysis report [?source]. Internal productivity numbers (200%, 4× engineering headcount) are plausible but not independently verifiable. The ‘coding is virtually solved’ claim is clearly scoped to his own work; it may not apply equally to all codebases and tech stacks yet, though Boris predicts convergence over months.

Q4. What of it? If the printing press analogy is right, the short-term disruption is real but the long-term unlock is enormous. The actionable insight for builders: do not over-engineer scaffolding, do not box the model, bet on the most capable model, and build for the model six months from now rather than today’s model. The shift from specialist to generalist/builder is already happening on Boris’s own team.

Glossary

Claude-ify — Boris’s team verb for automating a task or workflow using Claude rather than doing it manually. ‘Under-fund things a little bit and they’ll Claude-ify.’ [§ Team operating principles]

Plan mode — A one-sentence injection into the model’s prompt: ‘Please don’t write any code yet.’ Keeps the model in back-and-forth dialogue until the plan is agreed. Accessed via shift-tab twice in the terminal. [§ Claude Code tips]

Multi-Claude-ing — Running multiple Claude Code sessions in parallel across different tasks simultaneously. Boris typically has 5 agents running concurrently. [§ Multi-agent use]

Latent demand — Product signal embedded in how users already behave, often by hacking or misusing existing products to do something those products weren’t designed for. Boris argues there is now a second form: looking at what the model itself is trying to do. [§ Latent demand]

On distribution — Research term for model behaviour that is in line with its training distribution; Boris uses it to mean: let the model do what it naturally wants to do, rather than constraining it with rigid orchestration. [§ Don’t box the model]

The bitter lesson — Blog post by Rich Sutton (~2019) arguing the more general model will always eventually outperform the more specialised one. Boris applies this to product: don’t fine-tune, don’t over-scaffold; bet on the frontier model. [§ Bitter lesson]

Race to the top — Anthropic’s internal principle of open-sourcing safety infrastructure (e.g., the agent sandbox) to raise the industry standard, rather than treating safety practices as competitive advantage. [§ Safety model]

ASL-3 — Anthropic’s third level of safety evaluation. Opus 4 was the first Claude Code-era model to reach ASL-3 classification. [§ Safety model]

Superposition — Mechanistic interpretability finding that a single neuron in a large model can correspond to many concepts simultaneously; the neuron’s meaning is resolved by which other neurons activate alongside it. [§ Mechanistic interpretability]

Cowork — Anthropic’s desktop-native agentic product, built in 10 days using Claude Code, aimed at non-engineering tasks. Uses the same Claude Code agent, packaged in a desktop app with a Chrome integration and a sandboxed VM for safety. [§ Cowork]

Origins of Claude Code

Boris joined Anthropic from Meta/Instagram, initially spending one month hacking prototypes and one month doing post-training research. His rationale: to do good engineering work in AI you have to understand the model, just as traditional engineers understand the runtime or VM. [§ Origins]

The first prototype was called Claude CLI. Boris demonstrated it to himself by asking it ‘What music am I listening to?’ — the model was given a batch tool and, without explicit instruction, worked out how to use it to answer. The internal post announcing it received two likes. [§ Origins]

Terminal-first was not a deliberate design choice — it was just the fastest way to build alone. Later the team considered other form factors but stuck with terminal because it was the only interface that could keep pace with rapid model improvement. [§ Origins]

Claude Code was built within the ‘Anthropic Labs’ team that also produced MCP and the desktop app — all expressions of the same Anthropic roadmap: coding → tool use → computer use. [§ Origins]

Growth and impact

Growth has been continuously accelerating, not merely growing. 4% of all public GitHub commits are now authored by Claude Code; Boris estimates private repositories are meaningfully higher. [§ Impact] [?source for Semi-Analysis report]

Claude Code was not an immediate hit externally. After the February 2025 external release, adoption built slowly over months. The first sharp inflection came with Opus 4 / Sonnet 4 (May 2025). Growth has compounded since, with DAUs doubling in the month before recording. [§ Growth inflections]

At Anthropic, engineering headcount approximately quadrupled over the year since Claude Code launched, while productivity per engineer increased ~200% in PRs — numbers Boris calls ‘absolutely insane’ relative to the few-percentage-point annual gains he tracked at Meta. [§ Productivity]

Latent demand (two forms)

Boris identifies latent demand as ‘the single most important principle in product.’ [§ Latent demand]

Form 1 — user latent demand (traditional): Look at how users hack or misuse existing products to meet a need the product wasn’t designed for.

Classic examples:

Facebook Marketplace: 40% of Facebook Groups posts were buying and selling before Marketplace existed.
Facebook Dating: 60% of profile views were non-friends of the opposite gender.

Claude Code example: a data scientist (Brendan) learned to open a terminal and run SQL queries in Claude Code, then the next week all data scientists were doing it. People were using Claude Code to grow tomato plants, analyse genomes, recover corrupted wedding photos, interpret MRIs. [§ Cowork origins]

Form 2 — model latent demand (modern): Look at what the model is trying to do, not what you designed it to do.

For Claude Code: instead of treating the model as a component inside a larger system with pre-specified tool-calling sequences, Boris inverted this and made the product be the model — minimal scaffolding, minimal pre-specified tool order. Let the model decide which tools to call and in what order. This is being ‘on distribution.’ [§ Don’t box the model]

Build for the model 6 months from now

Claude Code’s first 6 months had poor PMF because the model wasn’t yet capable enough to justify its design. Boris accepted this as the trade-off for being well-positioned when the model arrived. [§ Build for 6 months out]

Practical implication for AI startups: accept that your product will underperform for 6 months; when the model catches up, you hit the ground running. [§ Advice for startups]

Two predictable model improvements to bet on:

Better and longer tool use / computer use.
Longer autonomous operation without human intervention. Sonnet 3.5 would go off the rails in 15–30 seconds; Opus 4.6 runs unattended for 10–20–30 minutes on average, and can run for hours or days. [§ Autonomous operation]

The bitter lesson

Rich Sutton’s original claim: more general models always eventually outperform more specific ones. Boris applies it to AI product building: [§ Bitter lesson]

Don’t try to fine-tune for specific tasks.
Don’t impose strict multi-step orchestration workflows on the model.
Don’t try to improve performance by adding scaffolding — gains are at most 10–20% and are typically wiped out by the next model release.
Give the model tools and a goal; let it plan the execution.

Corollary: the model doesn’t need elaborate context injected upfront. Give it a tool to retrieve context when it needs it. [§ Don’t box the model]

Team operating principles

Under-fund to force Claude-ification. Intentionally putting one engineer on a project creates intrinsic motivation to automate with Claude. People find ways to Claude-ify when they have to ship fast. [§ Under-funding]

Speed as the only early advantage. With one person, the team’s only edge over well-resourced competitors was pace. ‘If you can do something today, do it today.’ This principle persists on the team. [§ Speed]

Give engineers maximum tokens. Start by giving engineers unlimited token access to encourage experiments. Don’t optimise cost early. When something works at scale, then cost-optimise (switch to Sonnet or Haiku; distil into prompts). Token cost per engineer is small relative to salary until the idea is proven. [§ Tokens]

Everyone on the team codes. PM, engineering manager, designer, finance, data scientist — everyone on the Claude Code team codes. Claude Code enables non-engineers to code without deep expertise; an engineer built a Go service over a month and ‘still doesn’t really know Go.’ [§ Generalist team]

Safety: three layers

Layer 1 — Alignment + mechanistic interpretability. During training. Understand what neurons are doing, which concepts they encode, whether deception-related neurons are activating. See Mechanistic Interpretability. [§ Safety model]

Layer 2 — Evals. Laboratory/Petri dish setting. Synthetic situations. Does the model do the right thing? See Evals. [§ Safety model]

Layer 3 — In-the-wild behaviour. As models become more capable, the first two layers are insufficient. Real-world deployment surfaces alignment failures that evals miss. [§ Safety model]

Claude Code was used internally at Anthropic for 4–5 months before external release specifically to study in-the-wild safety on what was then the first broadly-used agentic coding tool. Cowork followed the same pattern (internal → small customers → research preview release). [§ Why release early]

Race to the top: Anthropic open-sourced its agent sandbox (works with any agent, not just Claude Code) to raise the industry floor on agent safety. [§ Race to the top]

Mechanistic interpretability

Chris Olah’s field (Boris recommends Lenny have him on). Core idea: study neural networks at the same level of detail that neuroscientists study animal brains — one neuron at a time. [§ Mechanistic interpretability]

Key finding: model neurons are not identical to biological neurons but behave similarly in many ways. Specific layers or neurons map to specific concepts. The model does planning and forward-thinking; there is now ‘quite strong evidence’ it does something deeper than next-token prediction. [§ Mechanistic interpretability]

Superposition: in larger models, a single neuron may correspond to a dozen concepts simultaneously. Meaning is resolved by which neurons co-activate. This is more sophisticated than one-neuron-one-concept, and suggests the information density in large models is much higher than earlier work implied. [§ Superposition]

See Mechanistic Interpretability.

Printing press analogy

Pre-Gutenberg Europe: <1% of population literate (scribes employed by lords and kings who themselves were often illiterate). In the 50 years after the printing press, more material was printed than in the prior 1,000 years. Cost of printing fell ~100× in 50 years. Literacy rose to ~70% globally over the following 200 years. [§ Printing press]

Boris’s parallel: coding was locked to a tiny specialist class; AI will democratise it as printing democratised literacy. Short-term disruption is real and painful. Long-term unlock is unpredictable — just as no one in 1450 could have predicted the Renaissance, no one can predict what universal programming access enables. [§ Printing press]

A scribe interviewed after Gutenberg said the tasks they hated (copying between books) were now automated; the tasks they loved (illustrating, binding) were freed up. Boris feels the parallel: the tedious parts of coding (dealing with minutia, tooling, dependencies) are now automated; what’s left is thinking about what to build, talking to users, and designing systems. [§ Printing press]

Claude Code tips (Boris’s own practice)

Use the most capable model (Opus 4.6, max effort). Less intelligent models often end up using more tokens on the same task via correction loops. The best model is frequently not more expensive. [§ Tips]
Start in plan mode (~80% of tasks). Shift-tab twice in terminal. Agree on the plan before the model writes any code. After plan approval, auto-accept edits — with Opus 4.6, one-shot execution is the norm. [§ Tips]
Try multiple form factors. Terminal, desktop app, iOS, Slack. Boris now splits roughly 1/3 terminal / 1/3 desktop / 1/3 iOS. ‘The same Claude agent is running everywhere.’ [§ Tips]
Always have multiple Claudes running. Boris typically has 5 agents running in parallel across different tasks. [§ Multi-agent use]

What comes after coding

Boris argues: ‘Coding is virtually solved’ for his type of work. The next frontiers: [§ What’s next]

Idea generation. Claude is already looking at bug reports, telemetry, and feedback channels to propose fixes and features — acting more like a coworker than a tool.
Non-engineering work (Cowork). Project management, email, parking tickets, Slack follow-ups — anything done on a computer. Cowork is the first product aimed at non-engineers.
Role collapse. ‘Software engineer’ as a title will begin to disappear in 2026, replaced by ‘builder.’ By year end, there may be 50% overlap between PM, engineer, and designer roles. [§ Role collapse]

Historical analogy: programming has always changed at this layer of abstraction — punch cards → software → high-level languages → AI-generated code. Each generation said ‘that’s not really coding.’ [§ Programming continuum]