Michael Littman: Reinforcement Learning and the Future of AI

The guest

Michael Littman — Computer science professor at Brown University researching and teaching machine learning, reinforcement learning, and AI. A prolific maker of computer-science parody songs and a former TurboTax commercial cameo, he has had a front-row seat to RL's history since the 1980s.

The gist

Lex Fridman talks with reinforcement learning researcher Michael Littman in a lighthearted, wide-ranging conversation. They open with sci-fi, music taste, Littman's parody-song hobby, and his TurboTax commercial before diving into AI. Littman lays out why he is skeptical of the superintelligence existential-threat argument, arguing we will learn to control these systems as we build them. They trace the history of reinforcement learning from TD-Gammon to AlphaGo and AlphaZero, debate the limits of self-play and language models like GPT-3, and explore why driving is a surprisingly social problem. The episode closes with book recommendations and Littman's view that the meaning of life is balance.

Big reveals

Littman admits a strong opinion: he is 'not particularly moved' by the idea that we will accidentally create a superintelligence that destroys humanity.
00:19:50
He reveals he once wrote an op-ed pushing back on Elon Musk's AI warnings, arguing Musk's belief in the power of ideas is both his strength and his blind spot.
00:22:27
Littman estimates humanity may have had a 30-40 percent chance of destroying itself with nuclear weapons in the 20th century.
00:28:41
He confesses that from ages 13 to 15 he has almost no memories, having spent those years alone in his room programming his TRS-80.
00:38:32
He admits that generation after generation of his students failed to replicate TD-Gammon's results, concluding Gerald Tesauro is a 'neural net whisperer.'
00:55:04
Littman says AlphaGo impressed him more than AlphaZero, disagreeing with colleague Satinder Singh who found the no-human-data version more breathtaking.
01:09:05
Littman reveals he barely reads books and once joked he got into college on a 'help out the illiterate' program.
01:35:38
Watching his son learn to drive revealed to him that driving is fundamentally a social-interaction activity, a blind spot in self-driving research.
01:38:46

Things worth remembering

In 2011 Littman started listening to the weekly Billboard top 10 on the treadmill and discovered he has 'no musical taste'—he just likes whatever he has heard most recently.
00:05:46
On his TurboTax shoot, ~50 people filled one room, including staff whose only job was holding sun filters and another to keep him from getting lost.
00:10:27
Littman wrote a parody song about the halting problem set to Billy Joel's 'Piano Man,' prizing its internal rhymes.
00:16:10
He worked at Bellcore with Dave Ackley, first author of the Boltzmann machine paper, the first neural net that could handle XOR.
00:44:41
Littman essentially tried to reinvent reinforcement learning before Dave Ackley handed him Rich Sutton's 1984 TD paper and arranged for Sutton to visit.
00:46:14
Littman used the term 'self-play' in his 1996 PhD dissertation; the term 'rollout' comes from backgammon via TD-Gammon.
01:00:15
In top-tier chess, teams of humans working with computer programs can still beat the best standalone programs, though that gap is asymptoting.
01:14:17
A 'counter Moore's law' exists: development cost per chip generation also doubles, so development money per cycle stays roughly constant.
01:27:50
Littman frames Moore's law not as one exponential but as hundreds or thousands of S-curves stacked on top of each other.
01:31:30
For his 42nd birthday Littman threw a 'meaning of life' party with slide presentations; his answer was 'balance,' demonstrated with a unicycle and a RipStik.
01:53:40

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

RecommendedMedia

Robot & Frank

Jake Schreier (inferred)

“there's a movie called robot and frank which i think is really interesting because it's very near-term future” — Michael Littman 00:02:37

Find it on Amazon

RecommendedProduct

Kinesis keyboard

Kinesis

“it's a kinesis keyboard which is uh this butt shaped keyboard yes i've seen them yeah they're very uh i don't know sexy elegant” — Lex Fridman 00:57:06

Find it on Amazon

RecommendedBook

Program or Be Programmed

Douglas Rushkoff

“i find myself thinking of program or be programmed a lot by douglas roshkopf um which was it basically put out the premise” — Michael Littman 01:47:03

Find it on Amazon

RecommendedBook

Human Compatible

Stuart Russell

“i think i think stewart's book did a remarkably good job like a just a celebratory good job at describing ai technology and sort of how it works” — Michael Littman 01:49:06

Find it on Amazon

RecommendedBook

Exhalation

Ted Chiang

“one sci-fi book to recommend is exhalations by ted chang a bunch of short stories” — Michael Littman 01:51:08

Find it on Amazon

Topics

reinforcement learning AI existential risk AlphaGo / AlphaZero self-play language models self-driving cars Moore's law computer science education

Michael Littman: Reinforcement Learning and the Future of AI | Lex Fridman Podcast #144

The gist

Big reveals

Things worth remembering

Recommended in this episode

Robot & Frank

Kinesis keyboard

Program or Be Programmed

Human Compatible

Exhalation

Topics