Notes — How I Use LLMs
Source: raw/llm-use-script.md | Author: Andrej Karpathy | Feb 2025
Four questions [Adler frame]
Q1 — What is it about as a whole? A practical companion to “Deep Dive.” Shows how Karpathy personally uses LLM tools in daily professional and personal life. Heavy on demos and examples; light on theory.
Q2 — How is it argued? Screen-share demo format. Each capability is illustrated by a real example from Karpathy’s own use: caffeine in an Americano, gradient-check bug, Ca-AKG supplement research, Korean vocabulary learning, blood test analysis. The argument is inductive: here is what I do, here is what I observe, here is the habit.
Q3 — Is it true, in whole or part? Specific tool availability (Claude without web search, Gemini 2.0 Pro without search, Grok 3 without Python interpreter) reflects the ecosystem state at recording time (Feb 2025) and will have changed. The practical habits — few-shot prompting, context hygiene, tool matching, verification discipline — are durable.
Q4 — What of it? The most immediately actionable of the three technical talks. The verification discipline section is the most important: read the code, follow the citations, ask for transcription before trusting image extraction. The “LLM Council” framing is useful for high-stakes decisions.
Glossary
- LLM Council — Karpathy’s term for querying multiple frontier models on the same question and comparing.
- Thinking model — a model trained with RL on verifiable domains; reasons before responding; slower but more accurate on hard tasks.
- Deep Research — a feature (OpenAI Pro tier, later Perplexity, Grok) that issues many searches over minutes to produce a research report.
- Artifacts — Claude feature that renders code (React, Mermaid) inline in the browser.
- Advanced Voice Mode — ChatGPT’s native audio mode; processes audio tokens directly rather than via text.
- NotebookLM — Google tool for chatting with uploaded documents; can generate custom podcasts.
- Super Whisper — third-party macOS transcription app for system-wide speech-to-text.
- Custom GPTs — saved few-shot prompt presets in ChatGPT.
Key claims by section
Ecosystem [§ The Growing LLM Ecosystem]
- Key providers: ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google), Grok (xAI), DeepSeek (China), Mistral/Le Chat (France). [§ The Growing LLM Ecosystem]
- Chatbot Arena (lmarena.ai) ranks by blind human comparisons; SEAL leaderboard (Scale AI) by benchmarks. [§ The Growing LLM Ecosystem]
ChatGPT interaction [§ ChatGPT Interaction Under the Hood]
- Under the hood: one shared token stream; chat bubbles are presentation. [§ ChatGPT Interaction Under the Hood]
- New Chat wipes the context window. [§ ChatGPT Interaction Under the Hood]
- Entity you’re talking to: “a 1 TB zip file… read the internet six months ago and only remembers it vaguely.” [§ ChatGPT Interaction Under the Hood]
Pricing tiers [§ Be Aware of the Model You’re Using]
- Free: GPT-4o mini. Plus ($20/mo): 80 messages/3h on GPT-4o. Pro ($200/mo): unlimited GPT-4o + reasoning models. [§ Be Aware of the Model You’re Using]
- Same structure across Claude, Gemini, Grok. [§ Be Aware of the Model You’re Using]
Thinking models [§ Thinking Models and When to Use Them]
- Gradient-check bug example: GPT-4o gave a list; o1 Pro (1 min thinking) found the exact issue. [§ Thinking Models and When to Use Them]
- Claude 3.5 Sonnet, Gemini, Grok 3, DeepSeek R1 via Perplexity also found the bug. [§ Thinking Models and When to Use Them]
- Practice: try fast model first; escalate to thinking model if result seems off. [§ Thinking Models and When to Use Them]
Internet search [§ Tool Use: Internet Search]
- Perplexity was first to do this convincingly. [§ Tool Use: Internet Search]
- Rule: if answerable by Google search + skimming top results, use the search tool. [§ Tool Use: Internet Search]
Deep Research [§ Tool Use: Deep Research]
- Ca-AKG example: 10-minute run, 27 sources, mechanisms + human trials + safety concerns. [§ Tool Use: Deep Research]
- ChatGPT Deep Research: most thorough. Perplexity and Grok: shorter. [§ Tool Use: Deep Research]
- Treat as first draft; follow citations. [§ Tool Use: Deep Research]
File uploads [§ File Uploads: Adding Documents to Context]
- Evo 2 paper (30 MB PDF): upload, get summary, ask questions while reading. [§ File Uploads: Adding Documents to Context]
- Wealth of Nations: copy chapter from Project Gutenberg → paste → read alongside. [§ File Uploads: Adding Documents to Context]
Python interpreter [§ Tool Use: Python Interpreter]
- ChatGPT trained to recognise when mental arithmetic is insufficient → opens interpreter. [§ Tool Use: Python Interpreter]
- Grok 3 (at time): no Python interpreter, attempts in-head multiplication, gets remarkably close but wrong. [§ Tool Use: Python Interpreter]
Advanced Data Analysis [§ ChatGPT Advanced Data Analysis]
- OpenAI valuation example: silent substitution of 0.1 for N/A; verbal “1.7 trillion” vs correct ~$20T when corrected. [§ ChatGPT Advanced Data Analysis]
- “A capable but absent-minded junior analyst. Read the code. Verify the numbers.” [§ ChatGPT Advanced Data Analysis]
Artifacts [§ Claude Artifacts]
- Flashcard app from Adam Smith Wikipedia intro: React, flip animations, correct/incorrect tracking, hardcoded Q&A. [§ Claude Artifacts]
- Conceptual diagram from Wealth of Nations chapter: Mermaid tree rendered inline. [§ Claude Artifacts]
Cursor / vibe coding [§ Cursor: Writing Code Professionally]
- Cursor uses Claude 3.7 Sonnet via API; Composer (Cmd+I) operates across multiple files. [§ Cursor: Writing Code Professionally]
- Tic-Tac-Toe demo: set up React repo, delete boilerplate, add confetti on win, install react-confetti, download victory sound. [§ Cursor: Writing Code Professionally]
Audio [§ Audio (Speech) Input/Output]
- ~50% of queries spoken on desktop; ~80% on mobile. [§ Audio]
- Super Whisper: system-wide transcription on hotkey. [§ Audio]
- “Don’t type when you can speak.” [§ Audio]
Advanced Voice Mode [§ Advanced Voice Mode]
- True audio: spectrogram → tokens → model processes natively. No text in the loop. [§ Advanced Voice Mode]
- ChatGPT AVM: conservative, refuses a lot. Grok: more personality modes (romantic, unhinged, conspiracy). [§ Advanced Voice Mode]
NotebookLM [§ NotebookLM: Podcast Generation]
- Upload sources → chat or generate 30-min podcast. [§ NotebookLM]
- Useful for passive listening on topics no existing podcast covers. [§ NotebookLM]
Image input [§ Image Input]
- macOS: Shift+Cmd+4 to screenshot selection → Cmd+V to paste directly into chat. [§ Image Input]
- Always ask model to transcribe image content before trusting what it extracted. [§ Image Input]
Memory / Custom Instructions [§ ChatGPT Memory and Custom Instructions]
- Memory is saved between conversations; editable in Settings. [§ ChatGPT Memory]
- Custom Instructions: set tone, format, domain context so you don’t re-explain each session. [§ Custom Instructions]
Custom GPTs [§ Custom GPTs]
- Korean vocabulary extractor: sentence in → vocabulary table in
Korean ; Englishformat, ready for Anki. [§ Custom GPTs] - Few-shot prompts always outperform zero-shot. Custom GPTs = saved few-shot prompts. [§ Custom GPTs]