MIT 6.S094: Recurrent Neural Networks for Steering Through Time

The guest

Lex Fridman — MIT researcher and lecturer teaching the 6.S094 deep learning for self-driving cars course

The gist

This is a solo MIT 6.S094 lecture by Lex Fridman covering recurrent neural networks. He first grounds the audience in backpropagation, walking through a simple gate circuit (add, multiply, max) and the chain rule, then warns about vanishing and exploding gradients and the art of parameter tuning. He transitions to RNNs, explaining their loop structure, parameter sharing across time, backpropagation through time, and the long-term dependency problem that motivates LSTMs. He surveys many LSTM applications including machine translation, handwriting and text generation, image captioning, medical diagnosis, stock prediction, and audio generation. Finally he connects RNNs to driving, describing how Udacity competition winners used LSTMs to map image sequences to steering angles, speed, and torque.

Big reveals

Audience asks if there are models with feedback from output to input; Lex says that is exactly the definition of a recurrent neural network, which loops its output back as input.
00:04:45
Lex names vanishing and exploding gradients as the likely culprit when a DeepTraffic network produces no learning, and explains how zero gradients halt weight updates.
00:24:03
Long-term dependency failure is highlighted: vanilla RNNs struggle to remember context from earlier in a sequence, motivating LSTMs.
00:43:27
LSTMs are introduced as the architecture behind all impressive time-series, audio, and video results, with gates that decide what to forget, remember, and output.
00:45:40
Udacity self-driving competition first and third place winners used RNNs/LSTMs to map image sequences to steering angles, beating plain CNN approaches.
01:05:43
First place winner predicted not just steering angle but also speed and torque, using a sequence length of 10.
01:07:18
Third place Team Chauffeur used transfer learning, chopping the final layer off a giant CNN to feed 3000 features per frame into an LSTM with sequence length 50.
01:11:07

Things worth remembering

Convolutional neural networks can operate on a fixed five-second audio clip, treating it as a single fixed-size input.
00:02:10
The chain rule lets you compute gradients step by step by multiplying local gradients instead of building one giant analytic derivative.
00:13:53
Backpropagation reduces to three core gate types: addition distributes gradients equally, multiplication switches forward values, and max sends the gradient to only the largest input.
00:16:32
On ImageNet, humans tell cats from dogs in the top-five categories with about 96% accuracy, a benchmark machines later surpassed.
00:29:49
Evolutionary robotics uses genetic algorithms to evolve soft-bodied robots that learn to walk and swim in simulation.
00:34:36
An LSTM with convolutional layers can take silent video of a drumstick hitting objects and generate the sound it would make.
00:57:50
Stock prediction LSTMs can ingest news articles over time; crashes are easier to predict, allowing detection of an encroaching crash.
01:00:30
Geoffrey Hinton jokingly calls SGD Stochastic Graduate Student Descent: hiring grad students to keep tuning hyperparameters until the problem is solved.
01:14:27

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

RecommendedProduct

Udacity Self-Driving Car Engineer Nanodegree

Udacity

“One of the prizes for the competition is the Udacity, self-driving car engineer nanodegree for free, this thing is awesome, I encourage everyone to check it out” — Lex Fridman 01:04:39

Find it on Amazon

Topics

recurrent neural networks LSTM backpropagation deep learning self-driving cars machine translation vanishing gradients transfer learning