DeepMind's David Silver explains how AlphaGo, AlphaZero, and MuZero used self-play reinforcement learning to master games and discover superhuman creativity.

David Silver — Leader of DeepMind's reinforcement learning research group and the lead researcher on AlphaGo and AlphaZero, who also co-led AlphaStar and MuZero. He is one of the central figures behind modern deep reinforcement learning.
David Silver traces his path from programming a BBC Micro at age seven and building games to a PhD applying reinforcement learning to the game of Go. He explains the core of reinforcement learning, why Go was considered impossible for AI, and how deep learning plus Monte Carlo tree search produced AlphaGo's historic 2016 win over Lee Sedol. He details the leap to AlphaGo Zero and AlphaZero, which learned entirely from self-play with no human data, and MuZero, which learns even without being told the rules. The conversation closes on creativity, intrinsic reward, and a layered view of the meaning of life and intelligence.
Books, products and media the guest or host genuinely endorsed here — with the buy link.
Affiliate link — we may earn a commission at no extra cost to you.
Jonathan Williams (inferred)
“let me mention that cryptocurrency in the context of the history of money it's fascinating I recommend a cent of money as a great book on this history” — Lex Fridman 00:01:33Find it on Amazon