Home Lex Fridman Notes
Lex Fridman · 2018-12-16 · 42m

Pieter Abbeel: Deep Reinforcement Learning | Lex Fridman Podcast #10

Berkeley professor Pieter Abbeel on deep reinforcement learning, robot manipulation, self-play, imitation learning, and the hard problem of hierarchical reasoning.

Pieter Abbeel: Deep Reinforcement Learning | Lex Fridman Podcast #10
The guest

Pieter Abbeel — Professor at UC Berkeley and director of the Berkeley robotics learning lab, a leading researcher on teaching robots to understand and interact with the world using imitation and deep reinforcement learning.

The gist

Lex Fridman talks with UC Berkeley robotics professor Pieter Abbeel about the state of deep reinforcement learning and robotics. They discuss why beating Roger Federer at tennis is as much a hardware as a software problem, the psychology of interacting with robots, and why RL works despite sparse and delayed rewards. Abbeel shares his intuition that neural-net control benefits from being a gradual tiling of linear feedback controllers, and explains the open challenges of hierarchical reasoning and credit assignment over long time horizons. The conversation covers transfer learning, self-play, imitation and third-person learning, simulation ensembles for sim-to-real transfer, AI safety and testing, and ends on whether RL robots could be taught kindness and affection.

Big reveals

  • Abbeel argues a robot beating Federer is both a hardware and software problem; humanoid hardware is maybe 10-15 years away, though a wheeled non-bipedal machine could do it sooner.
  • Rather than explicit numeric rewards, robots can learn from comparative human feedback ('the last five minutes was nicer than the previous'), as shown by Paul Christiano's hopper learning backflips.
  • Abbeel's core intuition: ReLU neural-net control is piecewise-linear feedback control, and its success comes from a gradual tiling of the space with shared linear controllers.
  • He believes credit assignment over the extreme time scales of real life (decisions vs. muscle-fiber contractions) is beyond any current RL algorithm and demands hierarchical reasoning that doesn't yet exist.
  • The RL-squared (learning to reinforcement learn) work pursued faster learning via meta-learning instead of hand-designed hierarchy, and saw hierarchical behaviors emerge in maze navigation.
  • A breakthrough led by Chelsea Finn lets robots learn from third-person human demonstrations, 'like machine translation for demonstrations,' mapping a human's action to the robot's own body.
  • Abbeel proposes using an ensemble of imperfect simulators rather than one precise simulator, treating the real world as just another sample from the distribution of simulators.

Things worth remembering

  • Abbeel's robot is named BRETT, the Berkeley Robot for the Elimination of Tedious Tasks.
  • Abbeel saw Boston Dynamics' Spot Mini at the Mars event organized by Jeff Bezos, where it was scripted to follow Jeff around the room.
  • At a Fidelity-organized event, a Pepper robot scripted to act like a child jumped into conversations, making it hard not to perceive it as a person despite being 100% scripted.
  • Deep RL work at Berkeley began around 2011-2013, with PhD student John Schulman initially driving it forward.
  • Abbeel notes a complex system like a hovering helicopter can be stabilized with relatively simple linear feedback control.
  • The 2012 AlexNet breakthrough's bigger long-term impact was fine-tuning/transfer learning, since what was learned on ImageNet could be reused for new tasks.
  • With teleoperation you can teach a robot a new basic skill, like picking up and placing a bottle, in about 10 minutes.
  • Human drivers pass a brief driving test yet average roughly one accident every million to ten million miles, far beyond what the short test measures.
  • Abbeel credits Andrew Ng with showing him the value of kindness.
  • Abbeel suggests we don't need human-level reasoning for strong affection, pointing to how much happiness people get from a dog greeting them at home.