Nicole Forsgren on DORA, SPACE, and Measuring Developer Productivity
Nicole Forsgren is a researcher and practitioner in developer productivity. At the time of this episode, she was a partner at Microsoft Research leading the Developer Experience Lab. She previously led DORA (DevOps Research and Assessment) as CEO — Google acquired the company — and held VP of Research and Strategy at GitHub. She created the DORA framework and co-created SPACE with colleagues from Microsoft and the broader research community. She is the co-author of Accelerate.
Key ideas
- Speed and stability move together. The most important empirical finding from years of DORA data: teams that deploy more frequently are more stable, not less. Frequent small deployments have a smaller blast radius and faster recovery. Infrequent large deployments create larger failures and slower recovery.
- Elite DORA benchmarks: deploy on demand; lead time under one day; MTTR under one hour; change fail rate 0–15%. These hold regardless of company size — statistically, small companies perform no worse than large ones on DORA metrics.
- SPACE prevents single-dimension measurement. The five dimensions — Satisfaction, Performance, Activity, Communication/collaboration, Efficiency/flow — exist precisely because optimising for one dimension without the others creates perverse incentives. Use at least three.
- The four-box framework: always start with words, not data. Top row: the hypothesis in plain language. Bottom row: the data proxies for each concept. If the correlation breaks, the proxy is wrong, not the hypothesis.
- Most conversations about “we need to move faster” are really conversations about strategy. The bottleneck is often not engineering speed; it is unclear priorities, incomplete strategy, or a failure to decide what not to build.
Episode content
Background and research trajectory
Nicole began as a software engineer at IBM writing software for large enterprise systems. Frustrated by slow, painful deployments, she decided to win the argument with data and earned a PhD in management information systems. The research question: is the way software is developed and delivered tied to measurable outcomes at the individual, team, and organisational level?
She ran the DORA State of DevOps Report annually, first with Puppet, then as CEO of her own company (DORA Inc.), and then following Google’s acquisition. The research had a SaaS offering that let large companies run the DORA assessment against their own data — the top of the funnel for strategic consulting that helped companies understand not just their benchmark but what capabilities to build next.
The DORA four metrics
DORA’s four key metrics are two speed measures and two stability measures:
Speed:
- Deployment frequency: how often code is deployed to production (or released to users)
- Lead time for changes: time from code commit to code running in production
Stability:
- MTTR (mean time to restore): how long it takes to recover from an incident
- Change fail rate: percentage of changes that cause an incident or require rollback
The finding that changed practice: these metrics move in tandem. Teams with high deployment frequency have low MTTR and low change fail rates. The causal explanation is structural: frequent deployment requires small batches. Small batches have small blast radii. Small blast radii produce easier, faster recovery. The reverse is also true: infrequent deployment accumulates large batches, which cause large failures that take longer to resolve.
Elite benchmarks (consistent with 2019 data and later reports): deploy on demand or multiple times daily; lead time under one day; MTTR under one hour; change fail rate 0–15%.
DORA vs SPACE
DORA tells you how your delivery pipeline is performing. SPACE tells you how to measure developer productivity more broadly when the pipeline is not the whole story.
SPACE is a framework for picking metrics, not a list of specific metrics to use. The five dimensions:
S — Satisfaction and wellbeing: how satisfied are developers with their work and tools? Nicole notes this is not a touchy-feely measure; declining satisfaction is an early indicator that other dimensions are about to degrade.
P — Performance: outcome metrics — what results did the work produce? Reliability, customer impact, revenue attribution.
A — Activity: counts of things — PRs, commits, deployments. Easy to instrument; dangerous to optimise in isolation (gaming is trivial).
C — Communication and collaboration: how people and systems interact. The searchability of a codebase, the number of hops a ticket takes to reach the right person, the proportion of work offloaded to senior engineers vs. resolved independently.
E — Efficiency and flow: time through the system, ability to get into and stay in deep work. Feedback loop speed.
The rule: use at least three dimensions simultaneously. If you measure only Activity, you incentivise meaningless output. If you measure only Performance, you incentivise cutting safety corners. Balanced measurement makes gaming harder and gives a more honest picture.
DORA is an implementation of SPACE: it uses Performance (change fail rate), Activity (deployment frequency), and Efficiency/flow (lead time, MTTR).
The four-box framework
Nicole uses this framework when advising on measurement strategy:
Draw a 2×2 grid. Top row: words. Bottom row: data.
Left column: the independent variable (what you think is the cause). Right column: the dependent variable (what you think is the outcome).
Example: “customer satisfaction [top left] → return customers [top right].” Fill in the bottom row with concrete data proxies: “CSAT score [bottom left] → number of return transactions [bottom right].”
The value: it separates the conceptual hypothesis from the measurement hypothesis. If the correlation between CSAT and return transactions breaks down, the problem might be in the proxy (CSAT is a poor measure of satisfaction) rather than in the underlying theory (satisfaction drives retention). This allows teams to critique measurements without abandoning the theory — or vice versa.
Advanced mode: start from the data you have and use the four-box to articulate what conceptual relationship the correlation would represent if it holds. This surfaces spurious correlations before they become policies.
Company size and DORA
One of the most robust findings from the DORA research: no statistically significant difference in performance between small and large companies. Nicole describes the reactions she got from each group: large companies said “but we have more complex codebases, this doesn’t apply to us”; small companies said “but large companies have more resources, this doesn’t apply to us.” She offered each group a dropdown menu to pick their excuse.
The exception was retail, which performed better than average — Nicole attributes this to natural selection: retailers that were not already high performers did not survive the retail apocalypse.
The implication: company size is not an excuse. The capabilities required for high DORA performance are available to teams of any size. The constraints are capability choices (automated testing, CI/CD, trunk-based development), not resource constraints.
What the data does not tell you
The key limitation of system instrumentation: it cannot surface what is not recorded. Nicole’s example: she worked with a company where a significant portion of mission-critical code was not in any version control system. System data said nothing about this because the code was not in any system. Only a developer survey surfaced it.
The practical lesson: data from systems and data from people are complementary, not substitutes. Even teams with comprehensive instrumentation should survey developers at least annually, because surveys surface the things that instrumentation cannot.