Torch Tutorial (Alex Wiltschko, Twitter)

The guest

Alex Wiltschko — Machine learning engineer at Twitter, developer of the torch-autograd package; gives this deep learning school tutorial on the Torch ecosystem.

The gist

This is a technical lecture on the Torch deep learning ecosystem and the Lua language it is built on. The first half covers practical fundamentals: tensors as views into memory, GPU computation, and the NN, optim, and Autograd packages for building and training neural networks. The second half dives into automatic differentiation, explaining why reverse-mode (backpropagation) beats forward-mode for neural nets, and how Autograd traces program execution just-in-time to compute gradients through control flow, loops, and recursion. Wiltschko then situates Torch among other libraries (TensorFlow, Theano, Keras, Caffe, Chainer) by granularity and graph-construction strategy, and closes with ideas the field could import from older AD communities like weather modeling.

Big reveals

Twitter runs Torch in production at scale: 'every piece of media that comes in to Twitter goes through a torch model at this point.'
00:11:51
Twitter built the torch-autograd package to glue together model pieces as small as addition, multiplication, and subtraction while still getting correct gradients.
00:19:37
Machine learning's key abstraction is reverse-mode automatic differentiation, also known as backpropagation, which is a special case of autodiff.
00:25:17
Forward-mode autodiff requires one full evaluation per parameter, so for a million-parameter network it is hopelessly expensive, which is why nobody uses it.
00:11:11
In production Autograd's tracing machinery disappears entirely, leaving plain numerical code with no test-time speed penalty.
00:44:25
Twitter serves Torch models by running Lua virtual machines inside Java, communicating over the JNI.
00:53:42

Things worth remembering

In the iTorch notebook you can prepend any torch function with a question mark to get its help documentation.
00:03:33
LuaJIT for-loops run at basically the same speed as C, so you pay little speed penalty for the convenient scripting layer.
00:04:05
The entire Lua language is defined by only about 10,000 lines of C code, small enough to learn in an afternoon.
00:04:36
Lua is embedded in unexpected places: World of Warcraft quest scripting, Adobe Lightroom's UI, and the scriptable layers of Redis and nginx.
00:05:37
Lua originally was chosen for Torch because it was far easier than Python to run on embedded chips for machine learning.
00:06:07
A Torch tensor is just a pointer/view into row-major memory defined by size, stride, and offset, so slices share memory rather than copying.
00:13:55
Automatic differentiation has been rediscovered many times; nuclear science, computational fluid dynamics, and weather modeling have more sophisticated AD tools than ML.
00:24:47
Autograd can pass gradients through non-differentiable operations like floor, useful for building a differentiable JPEG or MPEG encoder.
00:35:32
There is a graph type called 'sea of nodes' from Cliff Click's mid-90s thesis that naturally expresses both control and data flow but hasn't been used in a deep learning library.
00:41:16

Topics

deep learning Torch Lua automatic differentiation backpropagation neural networks machine learning frameworks GPU computing