Due to the limitation to Gaussian policies, the approach is incapable of converging to the global optimum, while policy gradient approaches over the set of entire policies are ensured (in the tabular case) to converge to a global optima

Distributional Policy Optimization: An Alternative Approach for Continuous Control

Chen Tessler*, Guy Tennenholtz* and Shie Mannor
Published at NeurIPS 2019
Paper, Code

What?

We propose a new optimization framework, named Distributional Policy Optimization (DPO), which optimizes a distributional loss (as opposed to the standard policy gradient).

As opposed to Policy Gradient methods, DPO is not limited to parametric distribution functions (such as Gaussian and Delta distributions) and can thus cope with non-convex returns

Author: Chen Tessler @tesslerc

Continue Reading Distributional Policy Optimization: An Alternative Approach for Continuous Control