Notes: Keith Coleman and Jay Baxter on Community Notes, the Bridging Algorithm, and Fixing Misinformation

Four questions [Adler frame]

Q1 — What is it about? The episode explains how Community Notes works mechanically and philosophically: a crowdsourced context layer on X (formerly Twitter) that surfaces notes only when raters who normally disagree with each other find them helpful. Keith Coleman (Product Lead) and Jay Baxter (Founding ML Engineer) walk through the algorithm, the contributor community, the transparency principles, and how the product survived four leadership regimes.

Q2 — How is it argued? Primarily through mechanism description — Coleman explains design decisions and their rationale, Baxter explains the ML system. Both draw on empirical evidence: external studies on attitude change, engagement-drop A/B tests, scale statistics. The argument is iterative: each design choice is traced to a principle, and each principle is defended by observed outcome.

Q3 — Is it true? The empirical claims are well-sourced. External researchers (independent of the team) found that notes change agreement with misleading claims; multiple research groups replicated the 50–60% repost-drop finding using difference-in-differences. The 8% note-display rate and 30 billion impressions in 2024 are stated figures without external verification in this context, but are plausible given the scale of X. The claim that the algorithm is fully replicable by a single machine (given ~500 GB RAM) has been confirmed publicly by Vitalik Buterin’s independent audit.

The key theoretical claim — that cross-partisan agreement selects for accurate, neutral notes — rests on a reasonable but contested assumption: that people who normally disagree cannot coordinate to approve false information at scale. The team acknowledges this is an empirical bet; pilot data and external research support it.

Q4 — What of it? Community Notes offers a replicable model for internet-scale epistemic infrastructure: open, crowdsourced, algorithm-governed, with no editorial override button. Its adoption by Meta signals it has become the industry default. The deeper implication, which Coleman surfaces at the end, is political: if cross-partisan agreement can surface around factual corrections, similar mechanisms might find agreement on policy. The product is also a proof of concept for small-team, mission-driven product development inside large organisations.

Glossary

Community Notes — A feature on X (formerly Twitter/Birdwatch) that allows registered contributors to write contextual notes on posts they consider misleading. A note displays publicly only when the bridging algorithm determines sufficient cross-partisan agreement exists among raters.

Bridging algorithm (bridging-based agreement) — The core scoring method. Rather than taking a majority vote of all raters, the algorithm looks for agreement from people who have previously disagreed with each other. Notes that attract this ‘surprising agreement’ score highly; notes that only attract support from one ideological cluster do not display.

Rating matrix — The data structure underlying the algorithm. Each contributor × note pair generates a rating (helpful / not helpful / various tags). Matrix factorisation decomposes this into latent factors — essentially estimating each contributor’s general political or ideological lean, then scoring note helpfulness controlling for that lean.

Helpfulness score threshold — Set at 0.4 on the algorithm’s internal scale. The threshold was calibrated against user feedback rather than derived analytically; it is conservative by design to protect note quality.

Contributor community — The ~950,000 registered volunteers (as of the episode) who rate and write notes. Entry requires a verified phone number. Writing notes requires earning that ability through a track record of rating notes that others subsequently rate as helpful.

Note helpfulness score — The output of the bridging algorithm for a given note. Reflects estimated cross-partisan agreement, not raw rater counts. Notes above the threshold display; notes below do not, even if a majority of raters found them helpful.

Supernotes — An external research project that uses LLMs to generate note variants and a simulated contributor jury (based on past rating behaviour) to predict which variants would score above threshold. Represents the first substantial external contribution to the core algorithm.

The bridging algorithm

The central design decision is that Community Notes does not use a majority vote. A majority vote would amplify whichever ideological group has more contributors at a given moment — exactly the manipulation vector the team needed to close.

The algorithm instead uses matrix factorisation (trained with gradient descent) to estimate a latent factor for each contributor — roughly capturing their ideological position — and each note. A note’s helpfulness score reflects agreement across the polarisation axis rather than within it. The practical implication: a note supported only by users who already agree with each other will not display, regardless of their raw count.

The first algorithm Jay Baxter shipped was a PageRank variant, designed for manipulation resistance (it resists voting rings). Pilot data revealed that the real threat was not coordinated manipulation but simple partisan bias in ratings. Once that was clear, the team held an internal bake-off — Kaggle-style — and the bridging-based approach won.

The 0.4 threshold is conservative. The team prefers false negatives (good notes not shown) over false positives (bad notes shown), because a single well-publicised inaccurate note would erode the trust the entire product depends on. In practice, roughly 8% of proposed notes display; the rest are filtered, though the team acknowledges some gold sits in that 90%.

A secondary filter runs on ‘incorrect’ tags: even a note above the helpfulness threshold will not display if a significant share of raters who found it helpful also tagged it as containing factual errors.

Contributor community design

Entry and earning. Any X user with a verified phone number can join the contributor waitlist. Contributors begin by rating existing notes. The ability to write notes is earned: a contributor must demonstrate, through their rating history, that they can identify notes the broader community finds helpful. Coleman describes this as deliberately hard to earn — it filters for contributors with genuine cross-partisan discernment.

Pseudonymity. An early design assumption — that contributors would rate under their real names, improving trust — turned out to be wrong. Pilot data showed two effects in the opposite direction: (1) people were reluctant to write notes on controversial topics under their real names for fear of harassment; (2) people are more willing to cross partisan lines when anonymous. The team switched to pseudonymity, improving both note volume and cross-partisan agreement rates.

Reputation and filtering. Contributors who consistently rate bad notes as helpful (i.e., notes that attract cross-partisan unhelpfulness ratings) lose their rating weight in the algorithm. The system does not ban them, but stops counting their ratings. This filters for accuracy without excluding ideologically extreme contributors from the community.

Incentives. All contribution is voluntary. The team attributes high engagement to the impact asymmetry: an ordinary user with few followers can place a note on a world leader’s post that causes a retraction. Coleman cites a case where a contributor’s note on a White House tweet led the White House to delete and reissue the statement.

Transparency and verifiability

Transparency was a founding principle, not a later addition. The previous industry approach — internal trust and safety teams making editorial decisions — was distrusted precisely because it was a black box. The team’s response was to make the entire system auditable:

Algorithm open-sourced on GitHub. Every scoring update is a public commit. Anyone can inspect the code and understand what rules determine note display.
Data published daily. The full ratings dataset (contributor ratings, note content, scores) downloads as a TSV. This required architectural decisions that imposed engineering costs: the ML pipeline had to be structured so the full algorithm could run as a script on a single machine (requiring ~500 GB RAM and approximately one day of compute), rather than as a distributed service impossible to replicate externally.
Independent replication. Vitalik Buterin published a blog post documenting his independent run of the algorithm on the public data, confirming it does what it claims. The team treats this as a key trust-building event: it is enough for a handful of credible independent verifiers to have checked the system.

The team acknowledges the costs: this architectural constraint is unusual and would not have been the natural design choice had transparency not been a hard requirement from day one. Retroactively adding this property would likely have required a system rewrite.

Institutional resilience

Community Notes (originally Birdwatch) survived transitions through Jack Dorsey, Kayvon Beykpour, Parag Agrawal, and Elon Musk — four leaders with substantially different priorities and dispositions.

Coleman identifies several factors:

Product nature. A product that surfaces notes only when people who normally disagree find them helpful is structurally resistant to ideological capture. Leaders of any political orientation tend to find the notes credible precisely because they cannot easily dismiss them as partisan. The product’s value proposition is self-reinforcing: it works because it is not controlled editorially.

The no-override-button principle. No one at X can change the status of a displayed note. This was a deliberate and unsettling design choice — internal stakeholders initially pushed back hard. But it means no leader can quietly suppress a note that embarrasses the company or its allies. This structural commitment is what made the product trustworthy through transitions that might otherwise have compromised it.

Data-driven expansion. The team never proposed a step without data supporting it. Each phase of expansion (internal pilot → MTurk test → 1,000-person public pilot → US-wide → global) was backed by evidence that quality held. Leaders who encountered the proposal at any stage found a team presenting outcomes, not ideology.

Execution continuity. During the acquisition and post-acquisition turbulence, the team shipped every week. Coleman notes that sustained execution is itself a form of institutional protection: a product visibly working and growing is hard to kill.

The ‘Thermal’ structure. Before the acquisition, the project operated under a programme Kayvon Beykpour created called Thermal: one clear project driver, one clear senior decision-maker as the single escalation path, 100% team focus, self-selected membership, and milestone-based (not OKR-based) goal-setting. Coleman credits this structure with enabling the speed and autonomy that made the product survive its early years. The same structure continued under X, with Elon Musk as the single decision-maker above the team.