Sherwin Wu on AI-Native Engineering, Wizard Managers, and the Sorcerer's Apprentice

Sherwin Wu on AI-Native Engineering, Wizard Managers, and the Sorcerer's Apprentice

transcriptlennys-podcastai-engineeringagentic-engineeringopenaicodex

Sherwin Wu on AI-Native Engineering, Wizard Managers, and the Sorcerer’s Apprentice

Sherwin Wu, Head of Engineering for OpenAI’s API and Developer Platform, on how engineering is changing at the world’s most AI-native company — from coding statistics to agent fleet management to what builders should prioritise as models keep improving.


Key ideas

  • Codex at OpenAI. 95% of engineers use Codex daily; 100% of all pull requests are reviewed by Codex. Engineers using Codex open 70% more PRs than those who don’t, and the gap keeps growing.
  • Wizard managers. Software engineers are becoming managers of agent fleets — running 10–20 parallel Codex threads simultaneously. The SICP “sorcerer’s apprentice” analogy applies: enormous power, but steering matters as much as summoning.
  • Don’t follow customers in fast-moving AI. “The models eat your scaffolding for breakfast” — the field moves faster than customer feedback loops. Build for where models are going, not where they are.
  • Removing the escape hatch. An internal OpenAI team maintains a 100% Codex-written codebase, no manual fallback. The constraint forces explicit context-encoding — documentation, code comments, CLAUDE.md files — that solves the underlying agent-steering problem.
  • Second-order B2B SaaS boom. The one-person billion-dollar startup creates demand for hundreds of bespoke supporting tools; smaller software shops currently filling enterprise niches will be disrupted, but the long tail of B2B SaaS is a golden age.

Engineering at OpenAI

Sherwin’s team builds the API and developer platform — the infrastructure that essentially every AI startup relies on. This gives him an unusually broad view of how companies are (and aren’t) deploying AI effectively.

At OpenAI internally, Codex has reached saturation: 95% daily active use, 70% PR gap between heavy and light users, and 100% PR review coverage. The key metric Sherwin tracks is not lines of code but PR throughput — a proxy for shipping velocity.


The Sorcerer’s Apprentice Frame

The SICP textbook (written in 1980) described programming as sorcery: software engineers as wizards, programming languages as incantations. Sherwin argues this frame has now materialised literally — engineers are casting spells (prompts) and agents go off and execute tasks. The current moment is the Sorcerer’s Apprentice stage: the hat is on, the brooms are running, but the old sorcerer (mature tooling and practices) hasn’t arrived yet.

The implication: the value of the engineer has shifted from writing code to steering agents. Context, prompt quality, and judgment determine output quality more than raw coding skill.


Building for Where Models Are Going

The most common AI product failure Sherwin observes: building scaffolding for the current model’s limitations. “The models will eat your scaffolding for breakfast” — by the time a product ships, the model has improved past the limitation the scaffolding was designed to compensate for.

The corollary: listen to the model roadmap, not just the customer. In fast-moving AI, customer feedback loops operate at a slower cadence than the underlying capability curve.


See also