Notes — Itamar Gilad on Evidence-Guided Product

Four questions [Adler frame]

Q1 — What is it about? A meta-framework for shifting product organisations from opinion-based to evidence-guided decision-making. Itamar Gilad’s GIST model (Goals, Ideas, Steps, Tasks) integrates lean startup, design thinking, product discovery, and agile delivery into one coherent system. The episode covers the GIST model in detail, plus supporting tools: metrics trees, ICE + confidence meter, and the GIST board. Grounded in Google experience: Google+ (opinion-based failure) vs. Gmail tabbed inbox (evidence-guided success).

Q2 — How is it argued? Through direct contrast (Google+ at 1,000 people vs. tabbed inbox starting with no code, just a facade of HTML). The argument is empirical: evidence-guided companies like Amazon, Airbnb, and Google (in its best periods) balance human judgement with evidence rather than eliminating either. Each framework layer is presented with a ‘what people do wrong’ diagnosis and a prescriptive alternative.

Q3 — Is it true? The Google+ story is well-documented and the critique of plan-and-execute at scale is credible. The GIST model is a synthesis of well-validated methodologies (lean startup, design thinking, build-measure-learn). The confidence meter formalises intuitions most experienced PMs already act on. Weaknesses: the advice is high-context; applying it in top-down cultures faces political barriers the framework only partially addresses.

Q4 — What of it? GIST is most actionable at the team and mid-organisation level. The confidence meter is the single most portable tool — immediately applicable as a conversation aid for any prioritisation discussion. Outcome roadmaps (vs. release roadmaps) is the single most culturally disruptive change proposed; implementation requires exec alignment. The GIST board operationalises continuous discovery + delivery at the team level without requiring org-wide transformation.

Glossary

Opinion-based development: building decisions driven primarily by conviction, seniority, or narrative rather than empirical evidence about user behaviour. The Google+ pattern: ‘we all believe in this thing.’ [Contrast: evidence-guided]

Evidence-guided development: decisions informed by empirical evidence at each stage of discovery and delivery. Not the absence of judgement — judgement is ‘supercharged’ by evidence, not replaced.

GIST: Goals, Ideas, Steps, Tasks. A four-layer meta-framework for evidence-guided product development. Each layer addresses a different failure mode.

North Star metric: the metric that most directly measures value delivered to users (e.g., messages sent for WhatsApp, nights booked for Airbnb). Distinct from business KPIs (which measure value captured). Both together define the value exchange loop.

Metrics tree: a hierarchical decomposition of a top-level metric into its sub-drivers. Used to identify alignment, assign team ownership, and estimate the impact of sub-metric changes on the whole.

ICE: Impact, Confidence, Ease — a prioritisation framework for ideas. Impact = effect on goals; Ease = inverse of effort; Confidence = how sure we should be about our impact and ease estimates. Created by Sean Ellis.

Confidence meter: Itamar’s tool for calibrating ICE confidence scores. Ranges from 0 (self-conviction only) to 10 (AB test result). Organises evidence types into tiers: opinion-class (0–1), stakeholder review (1–2), estimates and plans (2–3), anecdotal data (3–4), market data (4–5), tests of various types (5–9), experiments (8–10). [See GIST Framework]

Fake door / smoke test / Wizard of Oz: low-fidelity validation techniques that test user interest or behaviour without building a real product. Gmail tabbed inbox was validated using a Wizard of Oz: researchers manually sorted the top 50 emails into tabs for participants without writing a single line of code.

Fish food: testing a product feature with your own team (more local than dog fooding — a Googley term).

GIST board: a lightweight team-level project management tool showing goals (key results), ideas under consideration (with ICE scores), and the next steps for each idea. Updated continuously; reviewed every two weeks. Bridges the gap between strategic roadmaps and task-level Agile execution.

Outcome roadmap: a roadmap framed as outcomes to achieve by a date (‘reduce churn by X% by Q3’) rather than features to ship (‘launch onboarding wizard by Q3’). Does not prematurely commit to solutions; leaves discovery open.

Release roadmap: a traditional feature-and-date roadmap. Competes with evidence-guided methods by pre-committing to solutions before confidence is established.

Value exchange loop: the core business dynamic: the organisation delivers value to users (measured by the North Star metric) and captures value back (measured by revenue/profit KPI). Keeping both growing and in balance is the meta-goal.

Google+ vs. Gmail tabbed inbox: the core contrast

Google+ (opinion-based): ~1,000 people at peak; existed as a separate division; plan-and-execute mode. Hypothesis: people need another social network. Nobody tested whether users wanted this before massive investment. Result: shut down 2019. Lesson: even smart leaders at a smart company can build the wrong thing for years when operating without evidence discipline.

Gmail tabbed inbox (evidence-guided):

Started as an idea nobody believed in.
Before writing a single line of code: Wizard of Oz test — researchers manually pre-sorted subjects’ inboxes into tabs, interviewed participants.
Result: ~85–88% of passive inbox users loved it; power users (who can manage their own inbox) found it pointless.
Discovery prevented the team from wrongly dismissing it based on internal ‘power user’ bias.

The tabbed inbox succeeded because the team separated their own opinions from user evidence. The Google+ team did not.

GIST framework: layer by layer

G — Goals

The wrong version: goals as a planning exercise (‘what do we build by when?’). Siloed team-level goals (engineering goals, marketing goals) that push teams in different directions.

The right version: two overarching metrics for the entire organisation:

North Star metric — value delivered to users (e.g., messages sent, nights booked, active learning users)
Top business KPI — value captured (revenue, profit, market share)

Build a metrics tree from each. The two trees often overlap in the middle, revealing sub-metrics that are double-load-bearing — moving them shifts both user value and business value.

Metrics tree uses:

Alignment: teams debate ideas using shared metrics, not differing priorities.
Ownership: sub-metrics assigned to teams as their area of responsibility.
Estimation: shows how much a sub-metric change at the bottom propagates to the top.
Team topology: org structure can be rationalised around metrics rather than functional hierarchy.

OKR integration: metrics trees + mission populate the OKRs. Max four key results per team. Supplementary OKRs cover product health.

I — Ideas

The wrong version: battle of opinions; highest-paid person’s opinion (HiPPO) wins; ideas evaluated based on strategic themes (‘it’s about AI, therefore it’s good’).

The right version: ICE prioritisation, calibrated by the confidence meter.

ICE:

Impact: effect on the stated goal. Hard to estimate — best-case backed by test data; at minimum, structured estimation.
Ease: inverse of effort. Also a guesstimate — but asking the question improves the discussion.
Confidence: how sure should we be about I and E? Most people assign high confidence prematurely.

Confidence meter tiers (0–10):

0–1: self-conviction; pitch decks; strategic theme alignment. Maximum 0.1 by design.
1–2: stakeholder/colleague review. Groups have biases; can produce worse decisions than individuals.
2–3: back-of-envelope estimates and plans.
3–4: anecdotal data — a handful of interviews, a competitor has this feature.
4–5: market data from surveys, competitive analysis, field research.
5–9: tests (usability tests, fake doors, early adopter programmes, previews).
8–10: experiments (AB tests, multivariate tests, staged rollouts).

Key principle: tie investment to confidence level. Start with cheap tests to gain confidence before committing to full builds.

S — Steps

The wrong version: ‘build an MVP’ = build something roughly complete and ship it. This conflates discovery and delivery.

The right version: a spectrum of validation methods, from zero-code to full release, ordered by cost and confidence gained:

Level	Examples	Cost
Assessment	ICE analysis, assumption mapping, stakeholder 1:1s, business modelling	Very low
Data	User interviews, competitive analysis, surveys, field observation	Low–medium
Fake/low-fidelity tests	Fake door, smoke test, Wizard of Oz, concierge	Low
Rough builds	Fish food (team testing), early adopter programme, longitudinal study	Medium
More complete builds	Dog fooding, preview, beta, labs	Medium–high
Experiments	AB test, multivariate, staged rollout, hold-backs	High

Each step is a learning milestone, not an engineering milestone. After each step, the team can: continue (if evidence positive), pivot the idea, or kill it and pick the next ICE idea.

T — Tasks

The wrong version: developers live in Agile world (story points, Jira tickets), managers live in planning world (roadmaps, strategy), PMs stuck in the middle trying to translate. Developers are disengaged from users and outcomes.

The right version: GIST board — a team-level tool showing:

Goals (key results, max 4, set at start of quarter)
Ideas under consideration (with ICE scores)
Next steps for each idea (learning milestones)

Updated continuously. Team reviews every two weeks. The discussion: Are we still working on the right ideas? How are we doing against goals? What’s blocking the most important steps? This ‘middle layer’ discussion is the one that usually doesn’t happen.

GIST board replaces (or supplements) the product backlog with a steps backlog — a queue of validation steps rather than features to ship.

Outcome vs. release roadmaps

Release roadmaps pre-commit to solutions with low confidence. They create a ‘launch by October’ mandate that kills evidence-guided behaviour even in teams that want to be evidence-guided.

Outcome roadmaps commit to goals: ‘reduce churn by X% by Q4.’ The solution is left open until confidence is gained. When a high-confidence idea is ready to ship, it can be promoted to a dated release milestone. The shift is culturally disruptive — requires exec alignment.

Stage calibration

Stage	GIST adaptation
Pre-PMF startup	Goal: find PMF. Iterate fast. North Star and business KPI may be unknown. Full metrics trees are overkill.
Series A–B	Goal: establish business model. Start building metrics. Lightweight GIST board useful.
Scale-up	Full GIST, metrics trees, outcome roadmaps. The cost of opinion-based development is highest here (more people, more wasted capacity).