Reading Notes

Gustav Söderström on Spotify, the Curation-to-Generation Shift, and Building at Scale

Source: Gustav Söderström on Spotify, the Curation-to-Generation Shift, and Building at Scale

Notes: Gustav Söderström on Spotify, the Curation-to-Generation Shift, and Building at Scale

Four questions [Adler frame]

Q1 — What is it about? The product and technology philosophy of Spotify’s co-president, CPO, and CTO, covering three topics at unusual depth: how internet platforms pass through three distinct paradigm shifts (curation → recommendation → generation) each requiring full product reimagination; how to design UIs that account for the real hit-rate of underlying ML; and how Spotify evolved from its famous squads model toward a more centralised org with autonomy concentrated at the VP level. A candid Spotify homepage redesign postmortem anchors the org-design and product-learning sections.

Q2 — How is it argued? Gustaf reasons from first principles rather than frameworks, then names the framework after he has explained it. He uses empirical evidence (A/B test data, user research traffic shifts) and independent analogues (MidJourney’s 4-image grid, Facebook News Feed reception, Apple vs Amazon org design) to validate each claim. The Swedish cultural lens — flat circles, Janteloven-adjacent egalitarianism — appears implicitly in his preference for distributed yet structured decision-making.

Q3 — Is it true? The three-paradigm framing is broadly accurate as analytical history, though the transitions are messier than the clean progression implies (recommendation and generation coexist today). The fault-tolerant UI principle is independently confirmed by MidJourney’s documented design choices. The squads critique is corroborated by Spotify’s own public statements. The autonomy-at-VP-level claim is empirically presented as a multi-year Spotify experiment.

Q4 — What of it? Product teams should: (1) treat generative AI as a new paradigm, not a feature addition — it will require full UX reimagination; (2) audit the hit-rate of every ML component and adjust UI multiplicity accordingly; (3) put autonomy at the VP level and resist both leaf-level dispersion and single-person bottlenecks; (4) differentiate between redesigns (involuntary for users, high cost) and new features (voluntary, contained cost) before launching; (5) separate the data signal from habit-break noise when a redesign draws negative feedback.


Glossary

Curation era: First phase of the internet. Users curate content — Facebook’s social graph, Spotify’s playlists. Human labour and choice drive content organisation.

Recommendation era: Second phase. Algorithms replace human curation. Spotify’s Discover Weekly, Netflix’s recommendation engine. Required full UX and business-model rethinking.

Generation era: Third phase, beginning now. Content is generated rather than selected. Gustaf’s claim: this will be as large a shift as the transition to recommendation.

Zero intent use case: Spotify’s internal term for users who open the app without knowing what they want. Previously served (badly) by radio; now served by AI DJ.

Fault-tolerant UI: UI designed to match the actual hit-rate of the underlying ML model. If the model is right 1 in 5 times, show 5 candidates so one is likely to resonate. Attributed to Chris Dixon.

Taste bubble: Spotify’s user research term for the phenomenon where algorithmic recommendation reinforces existing taste rather than expanding it. The case for discovery-mode features.

Squads model: Spotify’s originally famous org structure: small (c. 7-person), full-stack, highly autonomous teams. Now largely abandoned at scale in favour of larger teams and VP-level autonomy.

Recall vs discovery: Key distinction in Spotify’s home page analysis. Recall = accessing something the user already knows they want. Discovery = finding something new. The home page was 90% recall in use; a 2023 redesign incorrectly shifted it to 90% discovery.

Think it, build it, ship it, tweak it: Spotify’s four-phase product vocabulary. Risk must be reduced in the “think it” phase before “build it” spending begins.

Hard API mandate: Bezos’s rule at Amazon: all teams must expose their technology through hard, well-defined APIs. Gustaf’s reading: this forced the internal cooperation that enabled AWS — without it, decentralised competition would have made external exposure impossible.


§ The curation-to-generation shift [§ Framework]

Gustaf’s three-era model [§ ~00:00–00:14]:

  • Curation era: digitise a good (music, people, books), put it online, let users curate. Facebook, early Spotify.
  • Recommendation era: replace human curation with algorithms. Required rethinking the full UX and sometimes the business model. Spotify’s Discover Weekly, personalised playlists.
  • Generation era: content is generated, not curated or selected. Gustaf’s thesis: each transition required platforms to “rethink the entire user experience”. The generation shift will demand the same.

The key product implication: don’t treat generative AI as better recommendation. It’s a different paradigm — the user interface needs to be reinvented. This is why AI DJ could not have existed as a feature added to the recommendation system; it required a fundamentally new interaction model.

In ML terms: generation has no prior signal from the user (if you have signal, the content isn’t new), so hit rates will be much lower. The UX needs to accommodate that. Discovery feeds need to be fast, cheap to reject, and tolerance-matched to the likely hit rate.


§ Fault-tolerant UI design [§ ML-design principle]

Gustaf’s formulation [§ ~00:18–00:20], attributed to Chris Dixon: design your UI to match the performance of your ML. Key implications:

  • A “single big play button” only works with near-zero prediction error. In practice, show as many candidates as the model needs to guarantee at least one good hit on screen.
  • MidJourney validated this independently: generated 4 simultaneous low-res images because the model’s hit rate was roughly 1 in 4. Users would rate any of the 4 as “good” 25% of the time — so 4 images on screen almost always included one worth refining.
  • AI DJ’s rule: “do as little as possible and get out of the way.” Because users came for music, the generative layer needed to minimise its own footprint and avoid showcasing the technology at the expense of user goals.

This principle scales: as model performance improves, you can reduce the number of candidates shown. UI complexity should be a function of current ML quality, not of what would look impressive.


§ Squads model and autonomy at the VP level [§ Org design]

Spotify moved away from squads [§ ~00:28–00:34]. The original design was correct for a small, young, Swedish-influenced startup: high trust, flat hierarchy, maximum brain-power utilisation. The problems at scale:

  1. Scaling granularity: teams of 7 create enormous coordination overhead at hundreds of teams. More practical to have teams of 14+ with fewer overhead roles.
  2. Leaf-level autonomy with a junior org: 100 squads with 100 strategies and 100 directions. Produces “heat” — motion without directional coherence. Gustaf’s phrase: “we managed to get somewhere in spite of this, but I’d struggle to say we were efficient.”

The fix: move autonomy from the leaf to the VP level. VP-level leaders are senior (high pattern recognition), numerous enough to avoid single-bottleneck, and sparse enough to avoid 100-direction divergence. In Spotify’s case: Maya, VP Podcasting, defines the podcasting strategy autonomously without Gustaf making those calls.

The extreme models illustrate the trade-offs: Amazon (decentralised, speed, ships complexity to users) vs Apple (centralised, cleaner UX, slower). Spotify chose the Apple end of the spectrum because its strategy is a single unified application spanning music, podcasts, and audiobooks — that strategy requires a single recommendation engine and a coherent UX, which require central coordination.


§ The Spotify home redesign postmortem [§ Product learning]

A 2023 redesign that added TikTok-style discovery feeds to Spotify’s home screen drew strong negative user reaction [§ ~00:43–00:54]. Gustaf’s analysis:

What was right: Sub-feeds for discovery (podcast feed, music feed) worked as intended. Users who want to explore new music use them and save new songs. These are additive features that work.

What was wrong: Placing discovery feeds on the Home page shifted the 90/10 recall/discovery ratio to roughly 10/90. Users came to Home for recall (getting back to a playlist, continuing a podcast). The redesign eliminated that utility.

The signal parsing problem: Negative redesign feedback comes in two flavours — “you changed something and I’m upset” (habit break) vs “you changed something and it’s actually wrong” (real error). Both look the same in Twitter mentions. How to distinguish:

  • New user cohorts vs existing user cohorts: new users have no habits to break.
  • Traffic migration: users migrating from Home to Search and Library signals loss of recall utility, not just habit break.
  • UI repurposing: using a discovery-optimised UI (slot machine) for recall purposes (finding a known playlist) is a clear signal the recall function was removed.

The lesson about redesigns: Unlike feature additions (voluntary participation), redesigns impose costs on all users — even those who don’t want the change. Budget for significantly more negative feedback, and don’t treat it as proof the direction is wrong until you’ve separated the sources.

“Strong opinions loosely held” operationalised: believe in a hypothesis 100% until the data says no, then update and believe the new hypothesis 100%. The risk is emotional attachment — treating the redesign as a bet you can’t lose rather than a hypothesis you want to test.


§ Generative music and platform rights [§ AI music]

Gustaf’s analogical frame [§ ~00:21–00:26]: generative music tools are instruments, not threats. The DAW (digital audio workstation) wasn’t initially accepted as “real” music production — Avicii, who couldn’t play an instrument, was dismissed until it became obvious he had musical talent. EDM was the new genre that only the DAW could produce. Generative AI may produce the next new genre only it can produce.

The rights issue is separate from the music quality issue. Spotify has navigated a technology shift like this before: peer-to-peer piracy was exciting for consumers but unsustainable for creators until a new business model (streaming) emerged. Gustaf’s prediction: the same cycle will happen with generative music. The music industry grew larger after streaming than it was before piracy. The business model innovation hasn’t happened yet for generative music, but Gustaf believes it will.

One useful observation: the conflation of “AI and human” is already a false distinction. Talented real musicians use AI to generate new ideas. The question is what percentage AI, and who controls the rights — not whether AI music is legitimate.


§ The magic trick principle [§ Product virality]

A pattern Gustaf observes across viral product launches [§ ~00:26–00:28]: great products that take off have a “magic trick” moment — something that, on first encounter, seems impossible or inexplicable. Examples: AI DJ (how could it know my music that specifically?), MidJourney (how did it generate that image from text?), ChatGPT (how can it respond like a person?).

The magic trick wears off with repeated exposure. Its commercial value is in triggering the initial adoption loop and word-of-mouth. Fine-tuning a product to the point where it feels like magic is often about narrowing scope and raising the hit rate of a few key interactions, not about adding features.

Gustaf’s formulation: “It’s 0% art, 0% magic, and 100% science.” The claim is a provocation — he uses it to force teams to articulate what mechanism creates the magic-seeming experience, rather than hiding behind “product instinct.”