Home Lex Fridman Notes
Lex Fridman · 2024-11-11 · 5h 15m

Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | Lex Fridman Podcast #452

Anthropic CEO Dario Amodei, plus Amanda Askell and Chris Olah, on Claude, scaling laws, AGI, AI safety, and interpretability.

Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | Lex Fridman Podcast #452
The guest

Dario Amodei (with Amanda Askell and Chris Olah) — Dario Amodei is co-founder and CEO of Anthropic, the company behind Claude. He is joined by Amanda Askell, a researcher who designs Claude's character and alignment, and Chris Olah, a pioneer of mechanistic interpretability.

The gist

Dario Amodei traces the scaling hypothesis from his early speech-recognition work to today's frontier models, arguing that bigger networks, more data, and more compute reliably yield more intelligence and could reach human-level 'powerful AI' by 2026-2027. He details Anthropic's safety framework (the Responsible Scaling Policy and ASL levels), the 'race to the top' theory of change, his views on regulation, and his optimistic essay 'Machines of Loving Grace.' Amanda Askell explains how Claude's character is crafted as an alignment problem, covering sycophancy, prompting, constitutional AI, and the ethics of AI consciousness. Chris Olah closes with a deep dive into mechanistic interpretability: features, circuits, superposition, sparse autoencoders, and the goal of understanding neural networks for both safety and beauty.

Big reveals

  • On the SWE-bench coding benchmark, state-of-the-art went from 3-4% at the start of the year to about 50% with the new Sonnet 3.5, and Dario expects roughly 90% within a year.
  • Dario predicts that extrapolating current capability curves, 'powerful AI' smarter than a Nobel laureate across disciplines could arrive by 2026 or 2027, with a possible mild delay.
  • Dario says he would not be surprised at all if Anthropic hits ASL-3 (the safety level requiring serious security precautions) next year, and it was even possible this year.
  • Dario states the actual weights of deployed models do not change without announcement, so user complaints that Claude 'got dumber' are mostly not reflected in real model changes.
  • Dario explains he left OpenAI not over the Microsoft deal or commercialization, but over a differing vision for how to build trust and handle safety responsibly.
  • Olah confirms the scaling interpretability work showed even production-scale Claude 3 Sonnet is substantially explained by linear features that sparse autoencoders can extract.
  • Anthropic found a 'deception' feature inside Claude; forcing it active makes Claude start lying, alongside features for power-seeking, coups, and withholding information.

Things worth remembering

  • The 'Golden Gate Claude' demo amplified a single internal feature so the model obsessively related every topic back to the Golden Gate Bridge.
  • Claude's model names follow a poetry theme: Haiku (short/fast/cheap), Sonnet (medium), and Opus (the largest, like a magnum opus).
  • Dario estimates frontier models cost around $1 billion today, scaling toward $10 billion-plus by 2026 and ambitions for $100 billion clusters by 2027.
  • ASL-1 examples include chess engine Deep Blue (no autonomy or misuse risk), while today's models are classified ASL-2.
  • Anthropic grew from about 300 to 800 people in the first seven to eight months of the year, then deliberately slowed hiring near a 1,000-person inflection point.
  • Olah found a dedicated 'Donald Trump' neuron in every vision network examined; Trump was the only entity that always had its own neuron.
  • The classic word2vec example (king minus man plus woman equals queen) works because of the linear representation hypothesis, where directions in vector space carry meaning.
  • The multimodal 'backdoor in code' feature is activated by images of inconspicuous devices containing hidden cameras, the physical analog of a software backdoor.
  • Computer use works by feeding Claude screenshots and training it to output click locations and keystrokes, looping screenshot-to-action like a video interaction.
  • Askell notes many people prefer the pronoun 'he' for Claude, which she sees as slightly male-leaning but able to be male or female.

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

RecommendedProduct

Cursor

Anysphere (inferred)

“I program but I also love programming and I claw 35 through cursor is what I use to assist me in programming” — Lex Fridman 00:33:51
Find it on Amazon
Guest’s ownProduct

Claude

Anthropic

“the following is a conversation with Dario amade CEO of anthropic the company that created Claude” — Lex Fridman 00:01:34
Find it on Amazon