Token Economics

The study of the supply, demand, pricing, and distribution of AI inference tokens — the fundamental unit of consumption for large language models. As demand for AI capabilities has compounded, tokens have shifted from a commodity (cheap, abundant, broadly accessible) towards a strategically scarce resource whose allocation has material effects on competitive position.

Demand explosion

The primary demand signal is the compounding discovery of new use cases. Dylan Patel’s SemiAnalysis firm illustrates the trajectory: ~$10K/year in AI spend in 2024 rocketed to $7M/year annualised run-rate in mid-2025 — a 700× increase driven not by one flagship application but dozens of people independently building novel tools. At 25%+ of salary costs and rising, token spend at information-intensive firms is on track to exceed headcount costs within 12-18 months.

The driver is what Patel calls ‘Claude psychosis’ or ‘AI psychosis’: the moment a non-technical person discovers they can build a working application in days by prompting an AI, spending proliferates non-linearly. The discovery spreads through firms and industries with the dynamics of a contagion.

Phantom GDP

A structural blindspot in conventional economic measurement. When AI reduces production costs, output rises — but prices for that output fall too, because competition drives prices toward new lower costs. The result: real economic value created (faster output, broader access to information, better decisions) does not appear in GDP, which records transaction prices rather than welfare.

Example: an economist using AI to do the work of a 200-person department produces the analysis — but sells it at a price that reflects the new lower cost, not the old labour cost. GDP registers a smaller transaction; the real productive output is unchanged or higher. GDP shrinks while actual output grows. Patel calls this phantom GDP — economically real but statistically invisible.

The three problems of token capture

Patel distinguishes three distinct problems that determine whether an entity benefits from cheap tokens:

Using tokens — running inference, generating outputs
Generating value from tokens — translating outputs into economically useful products, insights, or decisions
Capturing value — commercialising and defending the output so that value accrues to you rather than customers or competitors

Most actors solve only the first problem. The ‘boring lazy way’ is doing one hour of work instead of eight. The economically productive path is doing 8× the work in eight hours and selling proportionally more. Failing to solve all three leads to what Patel calls the permanent underclass: entities that use tokens but generate no net economic advantage from doing so.

Token access as competitive moat

Frontier models command premium prices but generate disproportionate economic value. The gap between ‘what can be done with GPT-4-class’ and ‘what can be done with Opus-4.7-class’ is large and growing. This makes enterprise access to frontier models — guaranteed rate limits, preferential release access — a structural competitive advantage.

Access is rationed by compute capacity. Anthropic’s gross margins went from ~35% in early 2025 to 72%+ mid-year not because of price increases but because demand grew faster than compute supply could accommodate. The model labs are effectively sold out and must allocate their remaining capacity to the highest-value use cases, creating winner-take-most dynamics among enterprise clients.

Patel’s hypothetical: an investment firm that pre-buys the first $10B of tokens for every new model release gets exclusive access during the period of highest capability premium. That firm crushes competitors in every information-sensitive market.

Supply constraints

AI infrastructure supply chains are constrained across multiple layers simultaneously, and lead times are measured in years, not months:

DRAM/memory: Capacity grows only 20-30%/year. Decisions made in late 2025 produce incremental supply no earlier than 2028. Prices set to double or triple from 2025 levels.
Logic (TSMC): CapEx approaching $57B in 2025; may reach $100B by 2028. Upstream equipment (ASML, Applied Materials, Lam Research) will experience tail-whip demand as TSMC scale compounds.
GPUs: Useful lives extending to 7-8 years (not the initially assumed 5); H100 clusters re-signing for additional 3-4 year terms at higher prices. Gross margins expanding on the same hardware.
CPUs: Driven by (1) RL training environments, which run on CPUs; (2) deployed code/applications generated by AI, which ultimately runs on CPU-backed cloud instances.

The consequence: value migrates upward to those who secured early contracts, to hardware manufacturers whose prices are rising, and to model labs whose demand exceeds supply. Those with commodity access to tokens face margin compression; those with preferential access benefit from the scarcity premium.

Premium token segmentation

Jensen Huang’s 2026 Dwarkesh interview introduces a demand-side segmentation not captured in Patel’s supply-side analysis. Inference tokens are not homogeneous: some customers will pay a higher price for response-time tokens (low latency — fast time to first token) even at lower throughput, whilst others optimise for throughput tokens (maximum tokens per second per dollar). Huang’s analogy is commercial aviation: economy class (throughput) and first class (latency) at different price points, both profitable, on the same aircraft.

This segmentation expands the Pareto frontier for Nvidia and the model labs simultaneously. Groq (acquired by Nvidia) was built specifically for low-latency inference; the acquisition signals Nvidia’s intention to serve both market segments from a unified platform rather than ceding the premium-latency segment to specialised ASICs.

The practical implication for token economics: the effective price per token is not a single number but a schedule across the latency–throughput frontier. Enterprise applications that are latency-sensitive (real-time voice, agentic tools with human-in-the-loop, trading) will pay at the high end; batch processing and asynchronous workloads will pay at the low end. Total revenue per GPU-hour rises as the mix of premium-latency demand increases.

See Jensen Huang on Nvidia's Supply Chain Moat, Accelerated Computing vs TPUs, and the China Chip Debate.

Where mainstream views differ

Abundant token future: Some argue that competition between model labs (Anthropic, OpenAI, Google) will drive token prices to near zero through commoditisation, eliminating the access premium. Patel’s counter: even if tier-1 labs were commoditised, the constraint is physical compute — fabs, memory, power — not business strategy. Supply cannot be conjured faster than fabs can be built.

GDP measurement: Conventional economists would note that GDP measures market transactions, not welfare — and that the same productivity improvements occurred during prior technology waves (electrification, internet) without prompting redefinition of GDP. Patel’s ‘phantom GDP’ may simply be the standard measurement lag between productivity improvement and measured output. Whether AI’s deflationary effects are larger or faster than prior waves remains unresolved.

Sources

Dylan Patel on the Token Economy, AI Supply-Demand, and the Permanent Underclass — primary source; Patel’s direct observation of his firm’s token spend, the Phantom GDP concept, and supply chain analysis
Dylan Patel and Nathan Lambert on DeepSeek and China AI — adjacent; AI cost reduction from a geopolitical angle
Jensen Huang on Nvidia's Supply Chain Moat, Accelerated Computing vs TPUs, and the China Chip Debate — premium token segmentation and inference market structure
Scaling Laws — context on why frontier models continue to improve with compute, sustaining the value premium