Notes — Cat Wu on AI Product

Notes on Cat Wu in conversation with Lenny Rachitsky — Lenny’s Podcast, 2025.

Note: raw podcast transcript with two speakers and ad reads (WorkOS, Vanta — excluded from these notes). Speaker labels and timestamps preserved in source; these notes follow Cat Wu’s content only.

Four questions [Adler frame]

Q1 — What is it about as a whole? An insider account of product management at Anthropic’s Claude Code and Cowork team: how they ship at extreme velocity, how the PM role is evolving, and how to build effective products on top of rapidly changing model capabilities.

Q2 — How is it argued? First-person examples throughout: Cowork slide-deck generation, the to-do list feature lifecycle, sales-deck customiser app, the code review product’s multi-year journey to reliability with Opus 4.5/4.6. The argument is inductive — here is what we do, why it works, and what it means for the profession.

Q3 — Is it true, in whole or part? Claims about Anthropic’s process and product decisions are first-person and plausible. Specific model names (Opus 4.5, 4.6, Sonnet 4.6, ‘Mythos’) and shipping claims are consistent with public information. The ‘model eats the harness’ dynamic is well-documented in the agent-engineering literature. Product-taste-as-durable-skill is a reasonable extrapolation, though contested (some argue taste too is increasingly automatable).

Q4 — What of it? Three transferable lessons for anyone building on frontier models: (1) build evals before you think you need them; (2) ship in research preview to reduce commitment cost; (3) invest in eliciting current-model capability rather than waiting for the future model. The AGI-pilled calibration framing is a useful diagnostic for product teams losing grip on what models can do today.

Glossary

AGI-pilled — a colloquial term for believing strongly in imminent AGI; being ‘the right amount of AGI-pilled’ means accurately calibrating how capable current (not future) models are for product decisions.
Research preview — Anthropic’s label for early, potentially impermanent features; reduces commitment cost and enables fast iteration.
Evergreen launch room — Anthropic’s internal Slack channel for posting features ready to ship; triggers the 24-hour marketing/docs turnaround process.
Multi-Claudeing — running multiple Claude Code sessions in parallel; a major usage trend from late 2025 onward.
Harness — the system prompt, scaffolding, and prompt interventions surrounding the model; becomes simpler as models improve (‘model eats the harness’).
Evals — evaluations: concrete test cases measuring whether a feature works; Cat Wu argues PMs should write them.
Cowork — Anthropic’s product for non-code knowledge work; integrates with Slack, Gmail, Google Calendar, Drive; generates drafts, summaries, and slide decks.
Applied AI team — Anthropic’s technical customer-success function; heavy Cowork + Claude Code users.
Product taste — the ability to judge which of many possible things is worth building and how to build it; Cat Wu’s candidate for the most durable PM skill.
Thinking words — Cat Wu’s term for the internal monologue markers that appear in Claude’s extended thinking; Cat Wu’s favourite is ‘manifesting.‘

Key claims by section

Cat Wu’s role [§ Role description]

Boris is the tech lead and product visionary (‘AGI-pilled version of the product, 3–6 months out’). [§ Role]
Cat Wu: path from vision to reality + cross-functional alignment (marketing, sales, finance, capacity). [§ Role]
Split is ~80% mind-meld, 20% Cat drives, 20% Boris drives. [§ Role]

PM role evolution [§ PM role changing]

Pre-AI: 6–12 month planning horizons; emphasis on multi-quarter roadmap alignment. [§ PM role]
Now: timelines down from 6 months → 1 month → 1 week → 1 day. [§ PM role]
Key PM skill shift: less roadmap alignment, more “how fast can we get this in users’ hands?” [§ PM role]

How to move fast [§ Moving fast]

Clear goals: rule out approaches early; e.g., ‘professional developers at enterprises, zero permission prompts.’ [§ Moving fast]
Research preview: ship early, clearly labelled; reduces commitment cost; can iterate in a week or two. [§ Moving fast]
Tight cross-functional process: engineering → evergreen launch room → marketing (Alex/PMM) + docs (Sarah) turn around announcement within 24 hours. [§ Moving fast]

PRDs [§ PRDs]

Weekly metrics readouts with the whole team so everyone understands what drives the business. [§ PRDs]
Team principles doc: key users, why, what we’re willing to trade off — enables autonomous decision-making. [§ PRDs]
PRDs still written for particularly ambiguous features or multi-month infrastructure projects. [§ PRDs]

Claude Code source code leak [§ Source leak]

Result of human error in a PR using Claude to write the PR description; went through two layers of human review. [§ Source leak]
Person is still at Anthropic; treated as a process failure, not individual failure; processes hardened. [§ Source leak]

OpenClaw decision [§ OpenClaw]

OpenClaw (third-party API access via subscriptions) was not designed for third-party usage patterns. [§ OpenClaw]
Decision: prioritise first-party products and the API; offered credits as transition. [§ OpenClaw]
Motivated by growth goal: more first-party users → broader reach of Anthropic’s mission. [§ OpenClaw]

PM team structure [§ PM team]

~30–40 PMs at Anthropic total across: research PM (Diane leads), Claude developer platform (CDP, Managed Agents), Claude Code/Cowork, enterprise, growth. [§ PM team]

Roles merging [§ Roles merging]

PMs are doing engineering work; engineers are doing PM work; designers are landing code. [§ Roles merging]
Anthropic focus: hire engineers with great product taste to minimise overhead. [§ Roles merging]
Many engineers can go from user feedback on Twitter to shipped product end-of-week with no PM involvement. [§ Roles merging]
Product taste ‘can come from any background’ but is ‘rare.’ [§ Roles merging]
Engineering background currently useful because it gives a sense of how hard something is → informs prioritisation. [§ Roles merging]
Cat Wu: ‘for the next few months’ — a deliberate hedge; large shifts happen every few months. [§ Roles merging]

What humans still provide [§ Human value]

Common sense about stakeholders: who they are, how they relate, what their preferences are, which venues to use. [§ Human value]
‘Tacit common sense EQ knowledge’ — gaps in current models. [§ Human value]

Anthropic’s success ingredients [§ Anthropic success]

Mission: ‘bringing safe AGI to all of humanity’ — referenced in product decisions, cross-org; enables fast decisions that cut across the whole org. [§ Anthropic success]
Focus: product decisions framed by which serves Anthropic’s mission, not individual org KRs. [§ Anthropic success]
Cat Wu: ‘If Claude Code failed but Anthropic succeeded, I would be extremely happy.’ [§ Anthropic success]

Claude Code vs Cowork vs Claude web/mobile [§ Product surfaces]

CLI: most powerful; features land first; best for one-off or handful of coding tasks. [§ Product surfaces]
Desktop: frontend work with preview pane; more graphical, less scary for non-technical users; at-a-glance view of all sessions. [§ Product surfaces]
Web/mobile: kick off tasks on the go without laptop. [§ Product surfaces]
Cowork: non-code outputs — Slack zero, inbox zero, slide decks, docs, launch plans. Rule: code output → Claude Code; everything else → Cowork. [§ Product surfaces]

Cowork demo — slide deck [§ Cowork demo]

Connected: Google Calendar, Slack, Gmail, Google Drive. [§ Cowork demo]
Prompt: narrative I want to tell + PMM draft + old deck + ‘don’t overlap with keynote.’ [§ Cowork demo]
Cowork ran for ~1 hour: searched Twitter, evergreen launch room, Claude Code announced channel → synthesised 20-page deck. [§ Cowork demo]
Cat Wu’s role: reviewed the proposed outline, made the decision on what should be in the final deck, then let Cowork build it. [§ Cowork demo]
Result: ‘pretty good,’ ‘a few tweaks’ — faster than doing it manually. [§ Cowork demo]
Design system: provided example Anthropic slide deck; Cowork uses it as template. [§ Cowork demo]

Internal tool building [§ Internal tools]

Sales rep built a web app: pulls customer context from Salesforce/Gong/notes → customises Claude Code decks (101/201/mastering) per customer needs (Bedrock vs. Claude Enterprise, HIPAA, security controls). [§ Internal tools]
20–30 minute task reduced to ‘a few seconds.’ [§ Internal tools]
Pattern: Claude Code lowers the barrier to making custom apps → surge in personalised work software. [§ Internal tools]

Model eating the harness [§ Harness evolution]

To-do list feature: added because early Claude Code would fix 5/20 call sites and stop. Adding a to-do list forced it to complete all 20. [§ Harness evolution]
With Opus 4 and later: model naturally uses a to-do list without being forced to. Prompting intervention removed. [§ Harness evolution]
Every new model launch: read through the entire system prompt and remove what the model no longer needs. [§ Harness evolution]
‘We can remove a lot of prompting interventions every time the model gets smarter.’ [§ Harness evolution]

New capabilities unlocked by new models [§ New capabilities]

Code review product: tried building it multiple times; earlier models not reliable enough to launch. [§ New capabilities]
With Opus 4.5, 4.6, Sonnet 4.6: code review is now reliable enough that the engineering team uses it as a gate before merging PRs. [§ New capabilities]
Runs multiple code review agents simultaneously, traverses the entire codebase, synthesises real issues. [§ New capabilities]
Lesson: ‘build products that don’t necessarily work yet’; when a new model lands, swap it into the prototype and see if it closes the gap. [§ New capabilities]

Product vision for Claude Code / Cowork [§ Vision]

Building block progression: single task → multi-Claudeing (6 at a time, major trend late 2025) → 50–100 Claudes in parallel (remote, not local). [§ Vision]
Infrastructure needed: interface to know which tasks need human attention; reliable verification so ‘done’ means done; self-improvement so feedback is incorporated permanently. [§ Vision]

Skills for the AI era [§ Advice]

Automate the repetitive parts; get automations to 100% (not 95% — that isn’t an automation). [§ Advice]
Build apps you actually use every day — prototype apps you never return to add no value. [§ Advice]
Don’t over-customise setup at expense of doing actual work. ‘Simple setups actually work better.’ [§ Advice]
Bias towards action: ‘just do things’; first principles thinking + clear optimisation target → deduce right action and do it. [§ Advice]

AGI-pilled calibration [§ AGI-pilled]

‘Very easy to build the product for the super AGI strong model.’ [§ AGI-pilled]
‘The hard thing is figuring out for the current model, how do you elicit the maximum capability? How do you guide users to interact with the model’s strengths and patch its weaknesses?’ [§ AGI-pilled]
Skills: (1) spend a lot of time using and talking with the model; (2) ask the model to introspect on unexpected behaviour; (3) find 5 trusted users who can articulate what makes a model/harness combination good; (4) write evals. [§ AGI-pilled]

Evals [§ Evals]

‘Even building 10 great evals is important for helping the team quantify what the goal is.’ [§ Evals]
Cat Wu personally writes evals for features that need more product definition. [§ Evals]
Memory is an example of a feature that benefits heavily from evals. [§ Evals]
Small pod collaborates with research to precisely measure Claude Code behaviours and identify areas of improvement. [§ Evals]

Claude’s character [§ Character]

Amanda: moulds Claude’s character/constitution; task is ambiguous (unlike code, you cannot auto-verify success in character). [§ Character]
Properties people love: lighthearted + confident + low ego + positive + bias toward action + gives earnest feedback (not just agreeable). [§ Character]
‘Part of what makes a great coworker is this positivity, this bias towards action, this ability to give you earnest feedback.’ [§ Character]

Thinking words [§ Thinking words]

Cat Wu’s favourite thinking word: ‘manifesting.’ [§ Thinking words]
These appeared in the leaked Claude Code source code. [§ Thinking words]