Nathan Lambert
Post-training lead at the Allen Institute for AI (AI2). Author of the definitive book on Reinforcement Learning from Human Feedback. Works on OLMo — AI2’s fully open-weight language model series. One of the most technically credible public voices on post-training methodology, RL scaling, and the open-weight ecosystem.
Background
Research scientist focused on post-training: supervised fine-tuning, RLHF, RLVR, and the emerging RL environments frontier. Led the post-training work on OLMo 3. Writes publicly about AI training methodology and the state of open-weight models. His RLHF book is the most comprehensive technical account of post-training available outside closed lab research.
Known for: rigorous practitioner perspective on scaling economics; balanced takes on the China/US AI race; critical view of AGI definitions as operationally unhelpful.
Appearances in this wiki
| Episode | Source | Date |
|---|---|---|
| Dylan Patel and Nathan Lambert on DeepSeek and China AI | Lex Fridman Podcast | 2025 |
| Nathan Lambert and Sebastian Raschka on State of AI in 2026 | Lex Fridman Podcast | 2026 |
Key positions
- Three independent scaling axes all remain active: pre-training, RL with verifiable rewards, inference-time compute
- Data quality drove OLMo 3’s competitiveness despite using fewer tokens than some rivals
- Chinese open-weight releases are a durable market-access strategy, not a temporary anomaly
- AGI framing is semantically awkward; track task completion rates instead
- Autonomous agents operating with minimal oversight are 2026’s most consequential development