NeurIPS – Reinforcement Learning Research Labs

Chen Tessler*, Guy Tennenholtz* and Shie Mannor
Published at NeurIPS 2019
Paper, Code

What?

We propose a new optimization framework, named Distributional Policy Optimization (DPO), which optimizes a distributional loss (as opposed to the standard policy gradient).

As opposed to Policy Gradient methods, DPO is not limited to parametric distribution functions (such as Gaussian and Delta distributions) and can thus cope with non-convex returns

Author: Chen Tessler @tesslerc

Continue Reading Distributional Policy Optimization: An Alternative Approach for Continuous Control