Notes: Mike Krieger on Anthropic, AI Timelines, and Product in a 90% AI-Written Codebase

Four questions [Adler frame]

Q1 — What is this about? A CPO’s account of what product development looks like when the engineering bottleneck disappears. Krieger’s central claim: Anthropic is ‘patient zero’ for AI-native development (90%+ of its code is written by AI), and this has moved the critical path from implementation to two new bottlenecks — deciding what to build (alignment) and getting changes through production (merge queue, coherent shipping). The secondary layer covers MCP as a solution to the under-invested middle layer of the AI product stack, and what PM skills remain durable when AI handles engineering velocity.

Q2 — How is it argued? Through insider testimony and direct comparison. Krieger is both observer (CPO watching teams operate) and subject (his own coding with AI). The argument is grounded in specific numbers (90%, 95%), specific tools (Claude Code, merge queue), and specific product outcomes (Artifacts, Projects). The MCP argument is architectural: he maps the three-layer product stack (intelligence × context/memory × UI/applications) and identifies context/memory as historically under-invested, then explains why MCP addresses it. The AI timeline argument is illustrative: he describes his reaction to the AI 2027 paper as a product strategy moment rather than a safety moment.

Q3 — Is it true? The 90%+ claim is stated as internal measurement, not independently verified, but is consistent with public accounts from other Anthropic engineers. The new-bottlenecks claim is strongly supported by Krieger’s own case studies (Artifacts emerged from post-training × product intersection; Projects from context/memory investment). The MCP as missing middle layer claim is a design argument, not empirical — it is coherent and consistent with how integration layers evolve in other technology stacks, but it is too early to know whether MCP achieves what Krieger predicts. The durable PM skills claim (comprehensibility, strategy, opening eyes) is directionally right but underspecifies what ‘comprehensibility’ means operationally.

Q4 — What of it? For PMs: the three skills Krieger names are concrete planning targets — closing the capability-usage gap (comprehensibility), deciding where to compete (strategy), and enlarging people’s model of the possible (vision). For product teams considering AI-native development: the new bottlenecks claim suggests where to invest attention — not in engineering velocity (that is solved) but in alignment clarity and deployment coherence. For anyone building on LLMs: the MCP layer argument is a design prompt — what context and memory does your application need that the model cannot provide natively?

Glossary

Patient zero — Krieger’s term for Anthropic’s position as the first organisation to operate at scale with AI-written code across the full product development cycle. The company experiences the consequences (benefits and bottlenecks) of AI-native development before the pattern is widespread.

New bottlenecks — The constraints that replace engineering velocity when engineering velocity is no longer the limiting factor. Krieger identifies two: upstream (deciding what to build, alignment on priorities) and downstream (merge queue, getting changes coherently into production). The critical path has moved from implementation to judgment and deployment.

Product × research embedding — The highest-leverage PM role at Anthropic is inside the model training and post-training work. Products that are genuinely differentiated (Artifacts) emerged from the intersection of post-training decisions and product design, not from UX work on top of a fixed model.

MCP (Model Context Protocol) — Anthropic’s open protocol for connecting AI models to external context and tools. Krieger maps it as the solution to the under-invested middle layer: model intelligence (top layer), context and memory (middle), applications and UI (bottom). MCP gives the middle layer a standard interface.

LLM-friendly codebase — A codebase structured for interpretability by a language model. Krieger describes Anthropic’s codebase as written with the constraint that Claude should be able to read and edit it: clearly named functions, consistent patterns, minimal magic. This is a design constraint analogous to mobile-first or accessibility-first.

Comprehensibility — Krieger’s first durable PM skill: closing the gap between what a model can do and how most people actually use it. The task is to translate latent capability into legible, accessible interaction. This is not documentation; it is product design that makes capability discoverable.

90% AI-written code: the claim

Anthropic’s internal measurement: over 90% of the code in the company is written by AI. The Claude Code team is estimated at 95%+ — the tool is substantially built by itself.

The organisational implication Krieger draws: Anthropic is ‘patient zero’ for this way of working. It experiences the transition not as a productivity tool but as a structural change to how the company operates. The comparison is not ‘we use AI to write code faster’ but ‘we have reorganised around the assumption that engineering velocity is no longer the constraint.’

Code review has changed. The model is acceptance testing rather than line-by-line review: does this change do what it should do? Claude reviews PRs. The merge queue has become a bottleneck because individual velocity has increased but collective deployment coherence has not scaled proportionally.

The codebase is deliberately LLM-friendly — structured so Claude can interpret and modify it. This is a design choice, not an emergent property. The analogy: mobile-first design constrained how teams thought about interfaces in 2010; LLM-friendly design is the analogous constraint for 2025.

New bottlenecks: where the critical path moved

Pre-AI: the bottleneck was implementation. Could you build it? Engineering velocity determined how fast you could test ideas.

Post-AI at Anthropic: two new bottlenecks.

Upstream bottleneck — alignment: deciding what to build. When everyone can implement almost anything, the scarcest resource is clarity about what to implement. The PM’s job of defining and prioritising has increased in value, not decreased. Krieger describes this as alignment work — ensuring the team is building toward the same Column B.

Downstream bottleneck — deployment coherence: merge queue and shipping. Individual engineers (human or AI) are producing changes faster than the collective deployment pipeline can absorb them coherently. The constraint has moved from writing code to getting code into production without introducing regressions or incoherence.

The implication: the classical engineering management investment (hiring more engineers, improving IDE tooling) is increasingly misallocated. The investment that matters is in alignment processes and deployment infrastructure.

Product × research embedding

Krieger’s claim about where the highest-leverage PM work is at a model company:

The products that are genuinely differentiated (Artifacts, Projects) did not emerge from UX work on top of a fixed model. They emerged from the intersection of post-training decisions and product design — choices made during training about what capabilities to develop, shaped by product thinking about what users need.

The practical implication: the PM role most worth having at Anthropic is inside the model training and post-training work, not in the product layer above a fixed model. This is the opposite of the typical software company where product and engineering are separated and product works on top of a stable platform.

Artifacts is the canonical example. The capability required post-training investment to develop; the product design shaped what that investment targeted. Neither alone would have produced the outcome.

MCP: the missing middle layer

Krieger maps the AI product stack as three layers:

Model intelligence — what the model can do
Context and memory — what the model knows about the user and the session
Applications and UI — how the model is invoked and presented

His diagnosis: the middle layer (context and memory) has been under-invested relative to the other two. Teams were rebuilding integrations from scratch because there was no standard protocol. MCP provides the standard.

The origin: Justin and David at Anthropic recognised they were solving the same integration problem repeatedly. MCP is the generalisation.

The vision Krieger describes: everything in Claude AI — Projects, Artifacts, styles, memory — should be MCP-exposed so Claude can read from and write back to its own context. The protocol turns Claude’s context into a first-class programmable surface.

Durable PM skills in an AI-native world

Krieger’s three skills:

Comprehensibility — Closing the gap between what models can do and how most people use them. The majority of users operate at a fraction of the model’s capability because the interface does not make the capability discoverable. The PM task is to design interfaces and defaults that close this gap. This is neither documentation nor marketing; it is product design that makes capability legible.

Strategy — Deciding where to play. When engineering velocity is no longer the constraint, the constraint is judgment about which capabilities to develop and which markets to enter. Strategy becomes the bottleneck, not execution.

Opening eyes — The ability to enlarge someone’s model of what is possible. Krieger describes this as showing someone something they did not know they needed to see. It is related to comprehensibility but distinct: comprehensibility is about existing capability; opening eyes is about future capability.

The skill whose value has decreased is not stated directly but implied: the implementation-adjacent PM skill of writing detailed specs and managing delivery timelines is less valuable when delivery is fast. The skills that remain are the ones AI cannot substitute: judgment, direction, and translation.

AI timelines: the personal reckoning

Krieger’s account of reading the AI 2027 paper: he had two tabs open simultaneously — the paper and a product strategy document. The paper’s scenario (AI systems achieving transformative capability within two to three years) was not a safety concern for him in that moment; it was a product strategy concern. His reaction: ‘Am I the character in the story?’

SWE-Bench as a concrete timeline marker: when Dario Amodei predicted 90% SWE-Bench performance by end of 2025, the benchmark was at 50%. By the time of the interview, it was at 72% with new models. Krieger treats this as a concrete progress indicator rather than an abstract claim.

The personal analogue: raising children in this environment. Krieger’s framing — nurturing curiosity, the scientific process of discovery, independent thinking — is a response to the question of how to parent when AI can produce any answer on demand. The concern is not information access but epistemic agency: not delegating cognition entirely to AI.