Reproducing a Reproducible Robotics Benchmark

Authors: Omer Cohen & Raveh Ben Simon

Supervised by: Orr Krupnik

Visit our published materials for this project!
Github

Introduction

Standardized evaluation measures have aided in the progress of machine learning approaches in disciplines such as computer vision and machine translation, and even simulated robotics. However, real-world robotic learning has suffered from a lack of benchmark setups. To tackle this issue, a group of researchers from UC Berkeley developed a unique, cheap, easy to set up, robotic arm environment called REPLAB. The environment is presented as a benchmark for robotic reaching and object manipulation. The details of this environment are laid out in the article:

REPLAB: A Reproducible Low-Cost Arm Benchmark Platform for Robotic Learning Brian Yang , Jesse Zhang , Vitchyr Pong , Sergey Levine , and Dinesh Jayaraman.

Further technical details about how to set up your own REPLAB cell, along with links to the paper and original results, are given on the project website.

We set out to construct our own REPLAB cell in an attempt to gauge its reproducibility.

Continue Reading Reproducing a Reproducible Robotics Benchmark

Deep Reinforcement Learning Works – Now What?

Author: Chen Tessler
chen {dot} tessler {at} campus {dot} technion {dot} ac {dot} il

Two years ago, Alex Irpan wrote a post about why “Deep Reinforcement Learning Doesn’t Work Yet”. Since then, we have made huge algorithmic advances, tackling most of the problems raised by Alex. We have methods that are sample efficient [1, 21] and can learn in an off-policy batch setting [22, 23]. When lacking a reward function, we now have methods that can learn from preferences [24, 25] and even methods that are better fit to escape bad local extrema, when the return is non-convex [14, 26]. Moreover, we are now capable of training robust agents [27, 28] which can generalize to new and previously unseen domains!

Continue Reading Deep Reinforcement Learning Works – Now What?

Causal Inference

Why is Causal Inference Important for Reinforcement Learning?

Authors: Guy Tennenholtz, Shie Mannor and Uri Shalit
guytenn {at} gmail {dot} com

There has been growing interest relating Causal Inference (CI) to Reinforcement Learning (RL). While there are some great achievements in solving high dimensional RL problems, research on the intersection of RL and CI is still in its diapers. What makes these problems hard, and how do they relate to RL? In this blog we’ll give our view.

Continue Reading Why is Causal Inference Important for Reinforcement Learning?

Why does reinforcement learning not work (for you)?

Author: Shie Mannor
shie {at} ee {dot} technion {dot} ac {dot} il

 

So you run a reinforcement learning (RL) algorithm and it performs poorly. What then? Basically, you can try some other algorithm out of the box: PPO/AxC/*QN/Rainbow/etc… [1, 2, 3, 4] and hope for the best. This approach rarely works. But, why? Why don’t we have a “ResNet” for RL? By that I mean, why don’t we have a network architecture that gets you to 90% of the desired performance with 10% of the effort?

Continue Reading Why does reinforcement learning not work (for you)?

Due to the limitation to Gaussian policies, the approach is incapable of converging to the global optimum, while policy gradient approaches over the set of entire policies are ensured (in the tabular case) to converge to a global optima

Distributional Policy Optimization: An Alternative Approach for Continuous Control

Chen Tessler*, Guy Tennenholtz* and Shie Mannor
Published at NeurIPS 2019
Paper, Code

What?

We propose a new optimization framework, named Distributional Policy Optimization (DPO), which optimizes a distributional loss (as opposed to the standard policy gradient).

As opposed to Policy Gradient methods, DPO is not limited to parametric distribution functions (such as Gaussian and Delta distributions) and can thus cope with non-convex returns

Author: Chen Tessler @tesslerc

Continue Reading Distributional Policy Optimization: An Alternative Approach for Continuous Control