Dylan Patel on the Token Economy, AI Supply-Demand, and the Permanent Underclass

Source: Invest Like the Best Host: Patrick O’Shaughnessy Speaker: Dylan Patel Date: 2025 (recorded during Opus 4.7 launch week)

Key ideas

SemiAnalysis’s own token spend went from ~$10K/year to $7M/year in under six months — a 700× jump — with spend as a proportion of salary costs on track to exceed 100% by year-end. The driver was not one use case but dozens of people building novel internal applications across chip reverse-engineering, economic modelling, and energy grid analysis.
Token access is becoming the new competitive moat. Those with enterprise contracts and early model access will compound advantages. The “permanent underclass” will be those who generate tokens but fail to use them, generate value from them, and capture that value — three distinct problems.
The supply side is constrained across the entire stack: GPU useful lives are extending to 7-8 years (not 5); DRAM prices are set to double/triple through 2028 because incremental capacity decisions made now won’t materialise until 2028; TSMC CapEx may reach $100B by 2028, creating a tail-whip effect across all upstream equipment suppliers.
Phantom GDP: when AI slashes implementation costs, output prices also fall sharply, making the real economic value created by token use systematically invisible to GDP metrics. The value is real but uncaptured by conventional measurement.
Implementation was historically the scarce resource; ideas were cheap. That ratio has inverted. Now ideas are cheap and plentiful, implementation is trivially cheap, and the only scarcity is choosing which ideas justify the token spend — and having enough enterprise token access to implement them.

Summary

The token psychosis at SemiAnalysis

The episode opens with Dylan Patel describing his firm’s own token consumption as the most viscerally honest data point on AI demand. In 2024, SemiAnalysis spent low tens of thousands of dollars on AI. By mid-2025, spend had reached a $7M annualised rate — north of 25% of salary costs — and growing week over week. The driver was not one showcase application but dozens of people at the firm building novel tools independently:

A chip reverse-engineering lab employee used a few thousand dollars of Claude tokens to build a GPU-accelerated application that automatically identifies materials (copper, tantalum, germanium, cobalt) in microscopy images of semiconductor stacks — replacing what the employee said would have been “an entire team’s job at Intel.”
Malcolm, an economist formerly from a major bank with a 100–200-person economics department, single-handedly built macro models linking BLS task data to AI replaceability, created a “Phantom GDP” metric, and developed a suite of 2,000 evals — work he estimated would have taken a team of 200 economists a year.
Jeremy, leading data centre energy analysis, spent $6,000/day for three weeks to scrape every US power plant and transmission line, build a micro supply/demand map of the US grid, and create a dashboard that beat a competitor with 100 people working for a decade — in some ways, if not all.

The business logic: SemiAnalysis is in the information business. Anyone who moves faster and keeps improving the service will grow. Those who don’t will be commoditised.

Three problems of token capture

Patel offers a framework for why not everyone benefits from cheap AI:

Using tokens — simply generating them, running inference, getting outputs
Generating value from tokens — building applications, insights, or products that translate token output into economic output
Capturing value from token-generated value — commercialising, pricing, and defending the output so that value accrues to you rather than your customers

Most people he sees are solving problem 1 (using tokens) in the “lazy way” — doing one hour’s work instead of eight hours and coasting. The economically productive path is: same eight hours, but doing 8× the work, selling more, growing faster. “If you don’t do these three things, you’ll never escape the permanent underclass.”

Token access as competitive moat

The better the model, the higher the value generated per token. But frontier models are increasingly access-controlled: rate limits, enterprise contracts, selective deployment. The Anthropic Mythos example: the model was reportedly available internally since February 2025 but not released publicly — Anthropic is sold out at current compute levels and doesn’t need to release it. The upshot is that whoever secures preferential model access gets compounding advantage. A hypothetical Ken Griffin of Citadel signing a “$10B first-use agreement” for every new model release would crush competitors in information-intensive markets.

This concentrates the benefits of AI among fewer entities over time: “token usage and therefore the benefits of those tokens… aggregates among fewer and fewer and fewer companies.”

Phantom GDP

Malcolm’s “Phantom GDP” concept captures a structural blind spot in economic measurement. When AI makes production faster and cheaper, output grows — but so does price compression. A data set that used to sell for $X can now be produced for a fraction of the cost; competition drives the price toward that lower floor. GDP records only the transaction price, not the value to the buyer. The analyst who would have needed 200 economists to produce a report now does it alone; the report still sells, but at a lower price. Output per worker is up dramatically, but measured GDP shrinks or stays flat.

Patel cannot model this side of the demand equation — “what is the phantom GDP?” — and regards it as the hardest problem in tracking the real economic impact of AI adoption.

Supply side

The supply analysis covers the full semiconductor stack:

Compute (GPUs/ASICs): Useful lives extending to 7-8 years. H100 clusters are re-signing for additional 3-4 year terms. Margins expanding because renewal prices are rising on the same hardware.

Memory: DRAM and NAND capacity can grow only 20-30% per year; incremental decisions made in late 2025 don’t yield supply until 2028. Prices “will double or triple again” because “the only way to steal capacity from somewhere else in a capitalist economy is demand destruction via higher pricing.”

Logic (TSMC): CapEx approaching $57B in 2025; Patel projects $100B by 2028. This creates a tail-whip effect upstream — equipment suppliers (ASML, Applied Materials, Lam Research), optics, copper foil, glass fibre for PCBs — every niche supply chain in the stack is either sold out or receiving prepayments.

CPUs: Two demand sources: (1) reinforcement learning environments, which run on CPUs (not GPUs) and score model trajectories; (2) deployed applications — code and content generated by GPUs ultimately runs on CPU-backed servers. Both are sold out.

Model release cadence

Execution used to be the bottleneck. With AI, execution is cheap. The result: model research labs can now test more ideas per unit time — “implement more ideas and move on the treadmill faster and faster.” Anthropic went from targeting an L4 software engineer (achieved with Opus 4.6) to Mythos benchmarking at L6 in two months. The release cadence has compressed from six months to two months.

Patel’s framing: “What used to matter a lot was execution was very, very difficult and ideas were cheap. Now, ideas are cheap and plentiful, but execution is very easy. So, really only the good ideas are the ones that can justify the spend.”

Speakers

Dylan Patel — founder of SemiAnalysis; semiconductor infrastructure analyst; author of influential industry research on AI compute supply chains