Rohit Prasad: Amazon Alexa and Conversational AI

The guest

Rohit Prasad — Vice President and Head Scientist of Amazon Alexa and one of its original creators. He has led the science behind far-field speech recognition, natural language understanding, and the Alexa Prize conversational AI competition.

The gist

Lex Fridman interviews Rohit Prasad, the head scientist of Amazon Alexa, about the science and philosophy behind conversational AI. They discuss what intelligence and the Turing test mean for machines, the Alexa Prize social bot competition, and how Alexa was built from far-field speech recognition through multi-domain natural language understanding. Prasad emphasizes that trust, transparency, and customer control are paramount for AI in the home, especially around privacy. They explore the future of more conversational, self-learning, and natural assistants, and why reasoning about latent customer goals is the hardest unsolved problem ahead.

Big reveals

When Prasad joined, the far-field speech recognition team was 10 people and 9 of them thought it couldn't be done; he was the lone believer.
01:01:31
Prasad estimates we are still five to ten years away from a social bot sustaining a coherent, engaging 20-minute conversation.
00:16:40
Within six months of collecting far-field data and doubling down on deep learning, error rates were cut by a factor of five, giving him conviction it would work.
01:05:46
Prasad personally sees no negative to Alexa listening all the time to improve experience, as long as control and trust exist, contrasting with public privacy paranoia.
00:50:36
The original inspiration for Alexa was the Star Trek computer, captured in an Amazon 'working backwards' press-release document.
00:54:45
Prasad calls reasoning the hardest of all AI problems and doubts it will be fully solved even in a 40-year horizon.
01:30:54
Amazon launched a feature letting customers opt out of having humans review their voice data for annotation.
00:46:53

Things worth remembering

The Alexa Prize challenges universities to build a social bot that can converse coherently for 20 minutes, judged by real Alexa customers on a 1-to-5 scale.
00:14:33
Alexa launched with about 13 big skills and now has over 90,000 skills.
01:11:32
The wake word 'Alexa' is hard to detect because it shares sound units with phrases like 'I like you' or 'I love my Alexa.'
00:55:17
Alexa is now auto-correcting millions of utterances in the US with no human supervision, learning from when users repeat or correct requests.
01:19:53
Amazon did early work training with distributed GPUs so speech recognition training time scaled linearly with data.
00:59:24
Even in the early days Alexa used a statistical-first approach to language understanding, adding deterministic rules afterward to patch model bugs.
01:08:24
Alexa is integrated into third-party devices including cars, microwaves, appliances, and even toothbrushes.
00:31:17
Amazon designed transparency and control from the start: a light ring shows when audio streams and a physical mute button disables the microphones.
00:45:49

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

Guest’s ownProduct

Amazon Echo

Amazon

“when we build what is now called smart speaker or the first echo we were quite judicious about making these right trade-offs on customers behalf” — guest 00:45:18

Find it on Amazon

Topics

conversational AI Amazon Alexa speech recognition natural language understanding AI privacy and trust Alexa Prize machine learning AI reasoning