Home Lex Fridman Notes
Lex Fridman · 2016-09-27 · 1h 25m

Deep Learning for Computer Vision (Andrej Karpathy, OpenAI)

Andrej Karpathy delivers a deep-dive lecture on how convolutional neural networks revolutionized computer vision.

Deep Learning for Computer Vision (Andrej Karpathy, OpenAI)
The guest

Andrej Karpathy — Deep learning researcher at OpenAI, creator of Stanford's CS231N convolutional networks course, ConvNetJS, and arxiv-sanity.com

The gist

This is a technical lecture by Andrej Karpathy covering deep learning for computer vision, focused on convolutional neural networks (CNNs). He traces the field's history from Hubel and Wiesel's 1960s cat experiments through Fukushima's neurocognitron, Yann LeCun's 1990s LeNet, and the 2012 AlexNet breakthrough that transformed the field. He explains the mechanics of convolutional, pooling, and fully connected layers, then walks through the evolution of winning ImageNet architectures (AlexNet, ZFNet, VGGNet, GoogLeNet, ResNet). The talk closes with practical guidance on hardware, software frameworks, architecture selection, hyperparameters, and distributed training, followed by an extended audience Q&A.

Big reveals

  • By 2016 ImageNet top-5 error dropped to about 3.57%, roughly matching or beating estimated human accuracy of 2-5%.
  • CNNs replaced multi-page hand-engineered feature extraction pipelines with end-to-end trained networks, drastically cutting code complexity.
  • Features learned by pre-training on ImageNet transfer remarkably well to entirely different datasets and tasks.
  • In ~20 years the two main algorithmic advances over LeNet were dropout and ReLU, both essentially one-line changes that set values to zero.
  • Residual networks (ResNet) won ImageNet 2015 and many other challenges, enabling far deeper networks via skip connections acting as a gradient superhighway.
  • ResNets can be made much shallower and wider and still work as well or better, suggesting depth alone is not the key.
  • Karpathy's practical advice: 'don't be a hero' - use pre-trained ImageNet models and fine-tune rather than designing custom architectures.

Things worth remembering

  • Spectrograms are 2D arrays, images 3D, videos 4D, and text can be treated as a 1D array of numbers.
  • Karpathy built a web interface to measure his own ImageNet accuracy by competing against a CNN, losing points mostly on dog-breed identification.
  • Some ImageNet test images are actually mislabeled, and ImageNet contains 50 different types of terriers.
  • CNN neurons learned to detect faces, wrinkles, and printed text on their own, despite labels only being provided at the final layer.
  • GoogLeNet has only 5 million parameters versus VGGNet's 140 million, mainly by discarding fully connected layers.
  • A 56-layer plain network performs worse than a 20-layer one even on training data, an optimization problem ResNet's skip connections solve.
  • NVIDIA's DGX-1 supercomputer cost about $130,000 at the time of the talk.
  • State-of-the-art networks were typically trained for a few weeks across four or eight GPUs costing about $1,000 each.

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

Guest’s ownProduct

ConvNetJS

Andrej Karpathy

“this is comjs. uh this is um a deep learning library for training convolutional neural networks that I've that is implemented in JavaScript. I wrote this” — Andrej Karpathy 00:26:09
Find it on Amazon
Guest’s ownProduct

arxiv-sanity.com

Andrej Karpathy

“I think this is a natural point to plug very briefly my archivesity.com. So this is the best website ever and what it does is it crawls archive” — Andrej Karpathy 00:50:54
Find it on Amazon
RecommendedProduct

Keras

Keras (inferred)

“90% of the use cases are probably addressable with things like KAS. So KAS would be my go-to number one uh thing to look at.” — Andrej Karpathy 01:02:23
Find it on Amazon
RecommendedProduct

Torch

“I've used Torch for a long time. I still really like Torch. It's very lightweight, interpretable. It works just just fine.” — Andrej Karpathy 01:03:26
Find it on Amazon