Home Lex Fridman Notes
Lex Fridman · 2021-01-04 · 1h 28m

Dan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151

Rev's VP of engineering explains how AI and human transcribers combine to build the world's best speech-to-text engine.

Dan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151
The guest

Dan Kokotov — VP of Engineering at Rev.ai, the speech-to-text company behind Rev.com's human captioning/transcription and its ASR engine. A programmer-turned-manager who immigrated from Russia in 1991.

The gist

Lex Fridman talks with Dan Kokotov, VP of Engineering at Rev, about how the company built a leading speech-to-text pipeline by combining machine ASR with a two-sided marketplace of human transcribers. They discuss how Rev grew out of improving the Upwork freelancer model, the data flywheel that gives Rev high-quality labeled audio, and the gap between roughly 14% machine word error rate and ~3% human accuracy. The conversation broadens into the future of searchable indexed audio, podcasting and RSS versus exclusivity deals like Spotify's, censorship and platform responsibility, and the transition from being a programmer to managing humans. They close on favorite sci-fi, dystopian films, Stalin-era history, and the meaning of life as creation.

Big reveals

  • Lex states the episode is NOT sponsored by the guest and that no one can buy their way onto the podcast, even though Rev later agreed to sponsor.
  • Rev's ASR sits at roughly 14% word error rate on their test suite, versus an estimated 2-3% human word error rate.
  • Kokotov calls Rev's setup a 'magical flywheel': their business model is being paid to annotate the very data that trains their model.
  • He admits AWS and Amazon Mechanical Turk have terrible interfaces despite huge potential, and most users only touch them via API.
  • Both openly criticize podcast exclusivity, with Lex saying he'd turn down Spotify's $100M Joe Rogan-style deal on principle.
  • Kokotov argues outrage drives engagement and companies are judged by engagement, framing the core tension of platform design.
  • Kokotov gives a secular answer to the meaning of life: contributing to humanity through creation and raising children.

Things worth remembering

  • Kokotov's mother was a programmer in Russia; he played on the equivalent of a 286 PC with floppy disks as a kid.
  • Rev.com handles human/end-user services while Rev.ai is the ASR sub-brand, analogous to WordPress.com vs WordPress.org.
  • Rev was founded by an early Upwork employee to remove the friction of choosing freelancers by standardizing verticals.
  • Rev started with translation (birth certificates, documents) before adding audio transcription.
  • Most Rev transcribers are US-based work-from-home parents, students, or people seeking flexible income away from a 9-to-5.
  • On good audio, a skilled human transcriber takes about 2-3x the audio length to produce a transcript; real-time is very hard.
  • Rev measures itself against Google, Amazon, and Microsoft ASR and claims to beat them on its own internal test set.
  • Captioning that once cost $1/minute is now around $1.25, while automated Temi transcription is about 25 cents/minute.
  • Kokotov ties Brave New World's genetic sorting to modern 'genetic sorting' via assortative marriage and social-media echo chambers.
  • He praises Chernobyl's creator, a comedy writer with no Russia background, for capturing the spirit down to the pet bowls.

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

RecommendedMedia

Dune

Frank Herbert (inferred)

“the greatest sci-fi novel of all time is dune and the second greatest is the children of dune and the third greatest is the god emperor of doom” — Dan Kokotov 00:03:40
Find it on Amazon
RecommendedBook

Children of Dune

Frank Herbert (inferred)

“the greatest sci-fi novel of all time is dune and the second greatest is the children of dune and the third greatest is the god emperor of doom” — Dan Kokotov 00:03:40
Find it on Amazon
RecommendedBook

God Emperor of Dune

Frank Herbert (inferred)

“the third greatest is the god emperor of doom so i'm i'm a huge fan of the whole uh series” — Dan Kokotov 00:03:40
Find it on Amazon
RecommendedProduct

Adobe Premiere

Adobe

“examples are adobe premiere for video editing isotope rx for cleaning up audio auto hotkey on windows for automating keyboard mouse tasks” — Lex Fridman 00:01:34
Find it on Amazon
RecommendedProduct

iZotope RX

iZotope

“one other product i've used like that is uh for people who might be familiar is called izotope rx it's for audio editing” — Lex Fridman 00:07:52
Find it on Amazon
RecommendedProduct

AutoHotkey

AutoHotkey (inferred)

“auto hotkey on windows for automating keyboard mouse tasks emacs as an ide for everything including the universe itself” — Lex Fridman 00:01:34
Find it on Amazon
RecommendedProduct

Emacs

Free Software Foundation (inferred)

“emacs as an ide for everything including the universe itself i can keep on going but you get the idea” — Lex Fridman 00:01:34
Find it on Amazon
Guest’s ownProduct

Rev

Rev

“rev put a smile to my face so can you maybe take a step back and say what is rev and how does it work” — guest 00:08:54
Find it on Amazon
Guest’s ownProduct

Temi

Rev

“then we kind of created this uh temi service i think you might have used it which was kind of asr for the consumer” — Dan Kokotov 00:37:46
Find it on Amazon
RecommendedBook

Creative Selection

Ken Kocienda

“there's that book uh creative selection i don't know if you read it by a apple engineer named ken cocienda it's kind of a great book actually” — Dan Kokotov 00:44:58
Find it on Amazon
RecommendedBook

First, Break All the Rules

Marcus Buckingham (inferred)

“it's a pretty good book which some reason not the name escapes me um about management first break all the rules” — Dan Kokotov 01:07:51
Find it on Amazon
RecommendedBook

Brave New World

Aldous Huxley

“one is brave new world by aldous huxley um and it's kind of incredible how prescient he was” — Dan Kokotov 01:09:56
Find it on Amazon
RecommendedBook

1984

George Orwell (inferred)

“i mean 1984 is good of course as well like if you're talking about you know dystopian novels of the future” — Dan Kokotov 01:10:59
Find it on Amazon
RecommendedMedia

Brazil

Terry Gilliam (inferred)

“my favorite kind of uh dystopian science fiction is a movie called brazil which i don't know if you've heard of” — Dan Kokotov 01:11:30
Find it on Amazon
RecommendedMedia

Chernobyl

HBO (inferred)

“if you look at the the show hbo show chernobyl it's a really good story of how bureaucracy you know uh leads to catastrophic events” — Dan Kokotov 01:13:04
Find it on Amazon
RecommendedMedia

The Death of Stalin

Armando Iannucci (inferred)

“there's a comedic version of this i don't know if you've seen this movie it's called the death of stalin yeah i i like that” — Lex Fridman 01:13:34
Find it on Amazon
RecommendedMedia

Downfall

Oliver Hirschbiegel (inferred)

“there's a movie called downfall that people should watch i think it's the last few days of hitler that's a good movie” — Lex Fridman 01:14:05
Find it on Amazon