Notes — How I Use LLMs

Notes on Andrej Karpathy — standalone talk, February 2025.

Four questions [Adler frame]

Q1 — What is it about as a whole? A practical companion to ‘Deep Dive.’ Shows how Karpathy personally uses LLM tools in daily professional and personal life. Heavy on demos and examples; light on theory.

Q2 — How is it argued? Screen-share demo format. Each capability is illustrated by a real example from Karpathy’s own use: caffeine in an Americano, gradient-check bug, Ca-AKG supplement research, Korean vocabulary learning, blood test analysis. The argument is inductive: here is what I do, here is what I observe, here is the habit.

Q3 — Is it true, in whole or part? Specific tool availability (Claude without web search, Gemini 2.0 Pro without search, Grok 3 without Python interpreter) reflects the ecosystem state at recording time (Feb 2025) and will have changed. The practical habits — few-shot prompting, context hygiene, tool matching, verification discipline — are durable.

Q4 — What of it? The most immediately actionable of the three technical talks. The verification discipline section is the most important: read the code, follow the citations, ask for transcription before trusting image extraction. The ‘LLM Council’ framing is useful for high-stakes decisions.

Glossary

LLM Council — Karpathy’s term for querying multiple frontier models on the same question and comparing.
Thinking model — a model trained with RL on verifiable domains; reasons before responding; slower but more accurate on hard tasks.
Deep Research — a feature (OpenAI Pro tier, later Perplexity, Grok) that issues many searches over minutes to produce a research report.
Artifacts — Claude feature that renders code (React, Mermaid) inline in the browser.
Advanced Voice Mode — ChatGPT’s native audio mode; processes audio tokens directly rather than via text.
NotebookLM — Google tool for chatting with uploaded documents; can generate custom podcasts.
Super Whisper — third-party macOS transcription app for system-wide speech-to-text.
Custom GPTs — saved few-shot prompt presets in ChatGPT.

Key claims by section

Ecosystem [§ The Growing LLM Ecosystem]

Key providers: ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google), Grok (xAI), DeepSeek (China), Mistral/Le Chat (France). [§ The Growing LLM Ecosystem]
Chatbot Arena (lmarena.ai) ranks by blind human comparisons; SEAL leaderboard (Scale AI) by benchmarks. [§ The Growing LLM Ecosystem]

ChatGPT interaction [§ ChatGPT Interaction Under the Hood]

Under the hood: one shared token stream; chat bubbles are presentation. [§ ChatGPT Interaction Under the Hood]
New Chat wipes the context window. [§ ChatGPT Interaction Under the Hood]
Entity you’re talking to: ‘a 1 TB zip file… read the internet six months ago and only remembers it vaguely.’ [§ ChatGPT Interaction Under the Hood]

Pricing tiers [§ Be Aware of the Model You’re Using]

Free: GPT-4o mini. Plus ($20/mo): 80 messages/3h on GPT-4o. Pro ($200/mo): unlimited GPT-4o + reasoning models. [§ Be Aware of the Model You’re Using]
Same structure across Claude, Gemini, Grok. [§ Be Aware of the Model You’re Using]

Thinking models [§ Thinking Models and When to Use Them]

Gradient-check bug example: GPT-4o gave a list; o1 Pro (1 min thinking) found the exact issue. [§ Thinking Models and When to Use Them]
Claude 3.5 Sonnet, Gemini, Grok 3, DeepSeek R1 via Perplexity also found the bug. [§ Thinking Models and When to Use Them]
Practice: try fast model first; escalate to thinking model if result seems off. [§ Thinking Models and When to Use Them]

Internet search [§ Tool Use: Internet Search]

Perplexity was first to do this convincingly. [§ Tool Use: Internet Search]
Rule: if answerable by Google search + skimming top results, use the search tool. [§ Tool Use: Internet Search]

Deep Research [§ Tool Use: Deep Research]

Ca-AKG example: 10-minute run, 27 sources, mechanisms + human trials + safety concerns. [§ Tool Use: Deep Research]
ChatGPT Deep Research: most thorough. Perplexity and Grok: shorter. [§ Tool Use: Deep Research]
Treat as first draft; follow citations. [§ Tool Use: Deep Research]

File uploads [§ File Uploads: Adding Documents to Context]

Evo 2 paper (30 MB PDF): upload, get summary, ask questions while reading. [§ File Uploads: Adding Documents to Context]
Wealth of Nations: copy chapter from Project Gutenberg → paste → read alongside. [§ File Uploads: Adding Documents to Context]

Python interpreter [§ Tool Use: Python Interpreter]

ChatGPT trained to recognise when mental arithmetic is insufficient → opens interpreter. [§ Tool Use: Python Interpreter]
Grok 3 (at time): no Python interpreter, attempts in-head multiplication, gets remarkably close but wrong. [§ Tool Use: Python Interpreter]

Advanced Data Analysis [§ ChatGPT Advanced Data Analysis]

OpenAI valuation example: silent substitution of 0.1 for N/A; verbal ‘1.7 trillion’ vs correct ~$20T when corrected. [§ ChatGPT Advanced Data Analysis]
‘A capable but absent-minded junior analyst. Read the code. Verify the numbers.’ [§ ChatGPT Advanced Data Analysis]

Artifacts [§ Claude Artifacts]

Flashcard app from Adam Smith Wikipedia intro: React, flip animations, correct/incorrect tracking, hardcoded Q&A. [§ Claude Artifacts]
Conceptual diagram from Wealth of Nations chapter: Mermaid tree rendered inline. [§ Claude Artifacts]

Cursor / vibe coding [§ Cursor: Writing Code Professionally]

Cursor uses Claude 3.7 Sonnet via API; Composer (Cmd+I) operates across multiple files. [§ Cursor: Writing Code Professionally]
Tic-Tac-Toe demo: set up React repo, delete boilerplate, add confetti on win, install react-confetti, download victory sound. [§ Cursor: Writing Code Professionally]

Audio [§ Audio (Speech) Input/Output]

~50% of queries spoken on desktop; ~80% on mobile. [§ Audio]
Super Whisper: system-wide transcription on hotkey. [§ Audio]
‘Don’t type when you can speak.’ [§ Audio]

Advanced Voice Mode [§ Advanced Voice Mode]

True audio: spectrogram → tokens → model processes natively. No text in the loop. [§ Advanced Voice Mode]
ChatGPT AVM: conservative, refuses a lot. Grok: more personality modes (romantic, unhinged, conspiracy). [§ Advanced Voice Mode]

NotebookLM [§ NotebookLM: Podcast Generation]

Upload sources → chat or generate 30-min podcast. [§ NotebookLM]
Useful for passive listening on topics no existing podcast covers. [§ NotebookLM]

Image input [§ Image Input]

macOS: Shift+Cmd+4 to screenshot selection → Cmd+V to paste directly into chat. [§ Image Input]
Always ask model to transcribe image content before trusting what it extracted. [§ Image Input]

Memory / Custom Instructions [§ ChatGPT Memory and Custom Instructions]

Memory is saved between conversations; editable in Settings. [§ ChatGPT Memory]
Custom Instructions: set tone, format, domain context so you don’t re-explain each session. [§ Custom Instructions]

Custom GPTs [§ Custom GPTs]

Korean vocabulary extractor: sentence in → vocabulary table in Korean ; English format, ready for Anki. [§ Custom GPTs]
Few-shot prompts always outperform zero-shot. Custom GPTs = saved few-shot prompts. [§ Custom GPTs]