Reading Notes

How I Use LLMs

Source: How I Use LLMs

Notes — How I Use LLMs

Source: raw/llm-use-script.md | Author: Andrej Karpathy | Feb 2025


Four questions [Adler frame]

Q1 — What is it about as a whole? A practical companion to “Deep Dive.” Shows how Karpathy personally uses LLM tools in daily professional and personal life. Heavy on demos and examples; light on theory.

Q2 — How is it argued? Screen-share demo format. Each capability is illustrated by a real example from Karpathy’s own use: caffeine in an Americano, gradient-check bug, Ca-AKG supplement research, Korean vocabulary learning, blood test analysis. The argument is inductive: here is what I do, here is what I observe, here is the habit.

Q3 — Is it true, in whole or part? Specific tool availability (Claude without web search, Gemini 2.0 Pro without search, Grok 3 without Python interpreter) reflects the ecosystem state at recording time (Feb 2025) and will have changed. The practical habits — few-shot prompting, context hygiene, tool matching, verification discipline — are durable.

Q4 — What of it? The most immediately actionable of the three technical talks. The verification discipline section is the most important: read the code, follow the citations, ask for transcription before trusting image extraction. The “LLM Council” framing is useful for high-stakes decisions.


Glossary

  • LLM Council — Karpathy’s term for querying multiple frontier models on the same question and comparing.
  • Thinking model — a model trained with RL on verifiable domains; reasons before responding; slower but more accurate on hard tasks.
  • Deep Research — a feature (OpenAI Pro tier, later Perplexity, Grok) that issues many searches over minutes to produce a research report.
  • Artifacts — Claude feature that renders code (React, Mermaid) inline in the browser.
  • Advanced Voice Mode — ChatGPT’s native audio mode; processes audio tokens directly rather than via text.
  • NotebookLM — Google tool for chatting with uploaded documents; can generate custom podcasts.
  • Super Whisper — third-party macOS transcription app for system-wide speech-to-text.
  • Custom GPTs — saved few-shot prompt presets in ChatGPT.

Key claims by section

Ecosystem [§ The Growing LLM Ecosystem]

  • Key providers: ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google), Grok (xAI), DeepSeek (China), Mistral/Le Chat (France). [§ The Growing LLM Ecosystem]
  • Chatbot Arena (lmarena.ai) ranks by blind human comparisons; SEAL leaderboard (Scale AI) by benchmarks. [§ The Growing LLM Ecosystem]

ChatGPT interaction [§ ChatGPT Interaction Under the Hood]

  • Under the hood: one shared token stream; chat bubbles are presentation. [§ ChatGPT Interaction Under the Hood]
  • New Chat wipes the context window. [§ ChatGPT Interaction Under the Hood]
  • Entity you’re talking to: “a 1 TB zip file… read the internet six months ago and only remembers it vaguely.” [§ ChatGPT Interaction Under the Hood]

Pricing tiers [§ Be Aware of the Model You’re Using]

  • Free: GPT-4o mini. Plus ($20/mo): 80 messages/3h on GPT-4o. Pro ($200/mo): unlimited GPT-4o + reasoning models. [§ Be Aware of the Model You’re Using]
  • Same structure across Claude, Gemini, Grok. [§ Be Aware of the Model You’re Using]

Thinking models [§ Thinking Models and When to Use Them]

  • Gradient-check bug example: GPT-4o gave a list; o1 Pro (1 min thinking) found the exact issue. [§ Thinking Models and When to Use Them]
  • Claude 3.5 Sonnet, Gemini, Grok 3, DeepSeek R1 via Perplexity also found the bug. [§ Thinking Models and When to Use Them]
  • Practice: try fast model first; escalate to thinking model if result seems off. [§ Thinking Models and When to Use Them]
  • Perplexity was first to do this convincingly. [§ Tool Use: Internet Search]
  • Rule: if answerable by Google search + skimming top results, use the search tool. [§ Tool Use: Internet Search]

Deep Research [§ Tool Use: Deep Research]

  • Ca-AKG example: 10-minute run, 27 sources, mechanisms + human trials + safety concerns. [§ Tool Use: Deep Research]
  • ChatGPT Deep Research: most thorough. Perplexity and Grok: shorter. [§ Tool Use: Deep Research]
  • Treat as first draft; follow citations. [§ Tool Use: Deep Research]

File uploads [§ File Uploads: Adding Documents to Context]

  • Evo 2 paper (30 MB PDF): upload, get summary, ask questions while reading. [§ File Uploads: Adding Documents to Context]
  • Wealth of Nations: copy chapter from Project Gutenberg → paste → read alongside. [§ File Uploads: Adding Documents to Context]

Python interpreter [§ Tool Use: Python Interpreter]

  • ChatGPT trained to recognise when mental arithmetic is insufficient → opens interpreter. [§ Tool Use: Python Interpreter]
  • Grok 3 (at time): no Python interpreter, attempts in-head multiplication, gets remarkably close but wrong. [§ Tool Use: Python Interpreter]

Advanced Data Analysis [§ ChatGPT Advanced Data Analysis]

  • OpenAI valuation example: silent substitution of 0.1 for N/A; verbal “1.7 trillion” vs correct ~$20T when corrected. [§ ChatGPT Advanced Data Analysis]
  • “A capable but absent-minded junior analyst. Read the code. Verify the numbers.” [§ ChatGPT Advanced Data Analysis]

Artifacts [§ Claude Artifacts]

  • Flashcard app from Adam Smith Wikipedia intro: React, flip animations, correct/incorrect tracking, hardcoded Q&A. [§ Claude Artifacts]
  • Conceptual diagram from Wealth of Nations chapter: Mermaid tree rendered inline. [§ Claude Artifacts]

Cursor / vibe coding [§ Cursor: Writing Code Professionally]

  • Cursor uses Claude 3.7 Sonnet via API; Composer (Cmd+I) operates across multiple files. [§ Cursor: Writing Code Professionally]
  • Tic-Tac-Toe demo: set up React repo, delete boilerplate, add confetti on win, install react-confetti, download victory sound. [§ Cursor: Writing Code Professionally]

Audio [§ Audio (Speech) Input/Output]

  • ~50% of queries spoken on desktop; ~80% on mobile. [§ Audio]
  • Super Whisper: system-wide transcription on hotkey. [§ Audio]
  • “Don’t type when you can speak.” [§ Audio]

Advanced Voice Mode [§ Advanced Voice Mode]

  • True audio: spectrogram → tokens → model processes natively. No text in the loop. [§ Advanced Voice Mode]
  • ChatGPT AVM: conservative, refuses a lot. Grok: more personality modes (romantic, unhinged, conspiracy). [§ Advanced Voice Mode]

NotebookLM [§ NotebookLM: Podcast Generation]

  • Upload sources → chat or generate 30-min podcast. [§ NotebookLM]
  • Useful for passive listening on topics no existing podcast covers. [§ NotebookLM]

Image input [§ Image Input]

  • macOS: Shift+Cmd+4 to screenshot selection → Cmd+V to paste directly into chat. [§ Image Input]
  • Always ask model to transcribe image content before trusting what it extracted. [§ Image Input]

Memory / Custom Instructions [§ ChatGPT Memory and Custom Instructions]

  • Memory is saved between conversations; editable in Settings. [§ ChatGPT Memory]
  • Custom Instructions: set tone, format, domain context so you don’t re-explain each session. [§ Custom Instructions]

Custom GPTs [§ Custom GPTs]

  • Korean vocabulary extractor: sentence in → vocabulary table in Korean ; English format, ready for Anki. [§ Custom GPTs]
  • Few-shot prompts always outperform zero-shot. Custom GPTs = saved few-shot prompts. [§ Custom GPTs]