Notes — Aishwarya and Kiriti on AI Products

Four questions [Adler frame]

Q1 — What is it about?
A practitioner framework for building AI products that don’t fail spectacularly in production. The central argument is that AI products differ from traditional software in two deep ways — non-determinism and the agency-control trade-off — and these differences require a fundamentally different development lifecycle. The CCCD framework is their proposed solution.

Q2 — How is it argued?
Through case studies (customer support agents, underwriting systems, insurance pre-authorisation) and frameworks (the graduation ladder, the CCCD loop, the success triangle). The arguments are grounded in 50+ deployments — empirical rather than theoretical. The frameworks are proprietary (from their Maven course) and not independently validated.

Q3 — Is it true?
The core claims are well-grounded: non-determinism is a genuine engineering challenge; the agency-control trade-off is real; starting with high control and graduating is genuinely safer than launching V3 first. The ‘pain is the new moat’ framing is directionally correct — workflow-specific knowledge is genuinely hard to replicate — but could be overconfident: as models improve, generic agents may be capable enough that workflow-specific calibration provides diminishing advantage. The CCCD framework is sound as a process recommendation; its analogies to CI/CD are a useful pedagogical tool rather than a technical equivalence.

Q4 — What of it?
The agency-control trade-off is an important concept missing from the wiki. The CCCD framework is a useful lens for understanding why companies fail at AI product deployment. The evals discussion adds important nuance (semantic diffusion) that complements the existing Evals page. ‘Pain is the new moat’ is worth quoting as a counterweight to the more breathless ‘AI levels the playing field’ framing.

Glossary

Non-determinism — in AI products, both the user-facing input and the model output are non-deterministic. Unlike traditional software where clicking ‘Book’ produces a predictable result, the same natural-language prompt may produce different outputs, and users will phrase the same intent in countless ways. [§ Two fundamental differences]

Agency-control trade-off — every increment of autonomy granted to an AI agent corresponds to a reduction in human oversight. Trust must be earned through demonstrated reliability. [§ Agency-control trade-off] See Agency-Control Trade-off.

CCCD (Continuous Calibration, Continuous Development) — Ash and Kiriti’s AI development lifecycle framework. Right loop: scope → curate data → set up → design metrics → deploy. Left loop: observe production → spot error patterns → apply fixes → design new metrics → decide whether to graduate. Analogous to CI/CD but for behaviour, not code. [§ CCCD]

Behaviour calibration — the process of narrowing the gap between expected and actual AI system behaviour through iterative deployment, observation, and adjustment. Behaviour calibration, not feature shipping, is the central activity of AI product development. [§ CCCD]

Semantic diffusion — Aishwarya borrowing from Martin Fowler: a term that gets overloaded by different communities until it loses meaning. Applied to ‘evals’ — which now variously means: data labelling annotations, LLM judges, model benchmarks, and product evaluation datasets. [§ Evals]

Pain is the new moat — Kiriti’s framing: the companies building durable AI products have gone through the pain of iterating on failure modes specific to their domain. This accumulated knowledge is not replicable. [§ Pain is the new moat]

Key sections

The graduation ladder [§ Graduation ladder]

The practical implementation of the agency-control trade-off. Three-stage progression (suggest → draft → act) applied consistently across domains. The key insight that often gets missed: each lower stage provides implicit training data for the next. Specifically, logging human edits to AI drafts (V2) is near-free error analysis — you can see exactly what was wrong and build evaluation data for it.

This is the most immediately actionable recommendation in the episode. [?] Whether ‘four to six months’ to replace a critical workflow is right depends heavily on data quality and workflow complexity — but the directional point (no one-click agents, realistic timelines) is sound.

CCCD: when to graduate [§ CCCD: when to graduate]

The criterion for graduating to the next agency level: ‘when new calibration cycles produce diminishing new information.’ This is a vibes-based threshold — there is no numerical rule. Aishwarya acknowledges this and frames it as minimising surprise rather than hitting a metric. Worth noting: external events (model deprecations, shifts in user behaviour) can reset the calibration even at V3.

The success triangle [§ Success triangle]

The most commonly neglected element is culture, not technology. The specific failure mode: subject matter experts disengaging because they feel their jobs are threatened. This is a self-defeating spiral — the people best positioned to calibrate AI behaviour refuse to participate because they fear replacement. Leaders who frame AI as ‘augmentation not replacement’ actually get better AI, because they retain the human expertise needed for calibration.

Multi-agent misunderstanding [§ Overrated/Underrated]

Kiriti’s taxonomy of what works and doesn’t in multi-agent systems:

Works: supervisor agent orchestrating subagents (one coordinator, many workers).
Works: human orchestrating multiple single-purpose agents.
Doesn’t work well: peer-to-peer gossip protocol between agents (agent A tells agent B tells agent C). Hard to control what the customer sees; guardrails multiply; coordination overhead explodes.

This is a useful practical correction to the ‘just build a multi-agent system’ pattern that gets promoted without architectural specificity.

Cross-references

Agency-Control Trade-off — concept page created from this source
Evals — updated with semantic diffusion section
Agentic Engineering — CCCD is a structured approach to the same problems
Cat Wu on AI Product — complementary product leadership perspective
Nick Turley on ChatGPT — evals as lingua franca, complementary demystification
Boris Cherny on Claude Code — three-layer safety model addresses the same underlying problem at the model level