Home Lex Fridman Notes
Lex Fridman · 2026-01-31 · 4h 25m

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

Lex Fridman, Sebastian Raschka, and Nathan Lambert survey the 2026 state of AI: LLMs, scaling, open models, China, agents, and AGI.

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
The guest

Sebastian Raschka and Nathan Lambert — Sebastian Raschka is a machine learning researcher and author of 'Build a Large Language Model from Scratch' and 'Build a Reasoning Model from Scratch.' Nathan Lambert is the post-training lead at the Allen Institute for AI (Ai2) and author of a definitive book on Reinforcement Learning from Human Feedback.

The gist

In this multi-guest roundtable, Lex Fridman talks with researchers Sebastian Raschka and Nathan Lambert about the state of AI roughly a year after the 'DeepSeek moment.' They debate who is winning between US and Chinese labs, dissect the explosion of open-weight models, and explain the technical lineage from GPT-2 to today's architectures (Mixture of Experts, attention variants, KV cache). A large portion covers training stages, scaling laws, and the rise of reinforcement learning with verifiable rewards (RLVR) and inference-time scaling. They also discuss education and learning, careers in AI, tool use, continual learning, robotics, timelines to AGI/ASI, the business and consolidation landscape, NVIDIA's moat, and Nathan's ADAM Project to build American open models.

Big reveals

  • Raschka argues there will be no winner-take-all in AI because researchers rotate between labs, so ideas are not proprietary; the differentiating factor will be budget and hardware, not technology access.
  • DeepSeek is described as losing its crown as China's preeminent open model maker as Z.ai (GLM), MiniMax, and Kimi Moonshot rise, with DeepSeek having kicked off a broad Chinese open-weight movement.
  • Despite rapid AI progress, the autoregressive transformer derived from GPT-2 remains essentially the same architecture; advances come from data, systems, and training stages rather than fundamental architecture changes.
  • Lambert reveals he was on the Ai2 team (Tulu 3 work) that coined the term RLVR before DeepSeek, while DeepSeek did the actual breakthrough of scaling reinforcement learning.
  • Unlike RLVR and inference-time scaling, RLHF has no clean scaling law where log-increasing compute yields linear performance gains; the seminal RLHF scaling paper is about reward model over-optimization.
  • Lambert details the ADAM Project (American Truly Open Models) to build and host high-quality US open-weight models to compete with China's open-source ecosystem, noting Ai2 received a $100M NSF grant over four years.
  • Cursor reportedly updates its Composer model weights every 90 minutes based on real-world user feedback, described as the closest thing to real-world RL happening on a deployed model.
  • Anthropic lost a court case in 2025 and owed $1.5 billion to authors, tied to torrenting books, even though buying and scanning books was cleared as legal.

Things worth remembering

  • DeepSeek's pre-training reportedly cost about $5 million at cloud market rates, and OLMo 3 cost roughly $2 million to rent the cluster, while serving millions of users costs billions in compute.
  • Leaders cite scaling laws holding for about 13 orders of magnitude of compute as a reason it is unlikely to stop soon.
  • Pre-training dataset sizes are measured in trillions of tokens; researcher models use 5-10 trillion, Qwen is documented up to 50 trillion, and closed labs are rumored to reach 100 trillion tokens.
  • A survey of about 791 professional developers found both junior and senior devs ship AI-generated code, with senior developers more likely to ship code that is over 50% AI-generated, and about 80% finding work more enjoyable with AI.
  • The guests note a brief roughly 10-year window where exams could be digital yet uncheatable; after AI, education is reverting to blue books and oral exams.
  • The '996' work culture (9am to 9pm, six days a week, about 72 hours) originated in China and has been adopted in Silicon Valley AI companies.
  • OpenAI's average employee compensation is cited as over a million dollars in stock per year.
  • Manus.ai, a Singapore-based company, was founded about eight months prior and had a roughly $2 billion exit; Groq (~$20B) and Scale AI (~$30B) are cited as consolidation deals.
  • In the DeepSeek R1 paper, the longer the model trains, the longer its responses grow, and the 'aha moment' is when the model self-corrects and recognizes its own mistakes.
  • NVIDIA's real moat is described as the CUDA ecosystem built over roughly two decades, not the GPU chip itself, though LLMs may make replicating something like CUDA easier.

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

RecommendedBook

Build a Large Language Model from Scratch

Sebastian Raschka

“Sebastian is the author of two books I highly recommend for beginners and experts alike. First is Build a Large Language Model from Scratch” — Lex Fridman 00:00:31
Find it on Amazon
RecommendedBook

Build a Reasoning Model from Scratch

Sebastian Raschka

“two books I highly recommend for beginners and experts alike. First is Build a Large Language Model from Scratch and Build a Reasoning Model from Scratch” — Lex Fridman 00:00:31
Find it on Amazon
Guest’s ownBook

Reinforcement Learning from Human Feedback

Nathan Lambert

“Nathan is the post-training lead at the Allen Institute for AI, author of the definitive book on Reinforcement Learning from Human Feedback” — Lex Fridman 00:01:01
Find it on Amazon
RecommendedBook

Season of the Witch

David Talbot (inferred)

“It's a great book, Season of the Witch. I recommend it. A bunch of my SF friends who get out recommended it to me.” — Nathan Lambert 02:28:01
Find it on Amazon
RecommendedProduct

Claude Code

Anthropic

“You can select the same models on all of them and ask questions, and it's very interesting. Claude Code is way better in that domain. It's remarkable.” — Lex Fridman 02:23:36
Find it on Amazon
RecommendedProduct

Cursor Composer

Anysphere (inferred)

“I should say I use Composer a lot because one of the benefits it has is that it's fast.” — Lex Fridman 03:39:17
Find it on Amazon
RecommendedProduct

Grok 4 Heavy

xAI

“I actually do use Grok 4 Heavy for debugging. For like hardcore debugging that the other ones can't solve, I find that it's the best at.” — Sebastian Raschka 00:17:39
Find it on Amazon