Nathan Lambert

Post-training lead at the Allen Institute for AI (AI2). Author of the definitive book on Reinforcement Learning from Human Feedback. Works on OLMo — AI2’s fully open-weight language model series. One of the most technically credible public voices on post-training methodology, RL scaling, and the open-weight ecosystem.

Background

Research scientist focused on post-training: supervised fine-tuning, RLHF, RLVR, and the emerging RL environments frontier. Led the post-training work on OLMo 3. Writes publicly about AI training methodology and the state of open-weight models. His RLHF book is the most comprehensive technical account of post-training available outside closed lab research.

Known for: rigorous practitioner perspective on scaling economics; balanced takes on the China/US AI race; critical view of AGI definitions as operationally unhelpful.

Appearances in this wiki

Episode	Source	Date
Dylan Patel and Nathan Lambert on DeepSeek and China AI	Lex Fridman Podcast	2025
Nathan Lambert and Sebastian Raschka on State of AI in 2026	Lex Fridman Podcast	2026

Writing in this wiki

Article	Publication	Date
The American DeepSeek Project	Interconnects	4 July 2025

Key positions

Three independent scaling axes all remain active: pre-training, RL with verifiable rewards, inference-time compute
Data quality drove OLMo 3’s competitiveness despite using fewer tokens than some rivals
Chinese open-weight releases are a durable market-access strategy, not a temporary anomaly
AGI framing is semantically awkward; track task completion rates instead
Autonomous agents operating with minimal oversight are 2026’s most consequential development