Natural Policy Gradient

E441106

Natural Policy Gradient is a reinforcement learning optimization method that improves policy gradient updates by accounting for the geometry of the parameter space using the Fisher information matrix, leading to more stable and efficient learning.

Jump to: Statements Referenced by

Statements (46)

Predicate Object
instanceOf optimization method
policy gradient method
reinforcement learning algorithm
aimsFor better-conditioned updates
more sample-efficient learning
more stable learning
appliedIn continuous control
episodic reinforcement learning
robotics
assumes differentiable policy parameterization
basedOn natural gradient
canBeApproximatedBy Kronecker-factored approximations
conjugate gradient methods
canUse compatible function approximation
challenge computational cost of Fisher matrix inversion
convergenceProperty often more robust than vanilla policy gradient
describedIn A Natural Policy Gradient NERFINISHED
estimationMethod Monte Carlo sampling NERFINISHED
likelihood ratio gradient estimator
field machine learning
reinforcement learning
goal maximize expected return
improvesOver standard policy gradient
inspired Truncated Natural Policy Gradient NERFINISHED
Trust Region Policy Optimization NERFINISHED
introducedBy Sham Kakade NERFINISHED
mathematicalFoundation Riemannian optimization
information geometry
objectiveType on-policy objective
optimizes parameterized policy
stochastic policy
property accounts for geometry of parameter space
invariant to smooth reparameterizations
uses Riemannian metric induced by Fisher information
publicationYear 2001
relatedTo Actor-Critic methods
Proximal Policy Optimization NERFINISHED
Trust Region Policy Optimization NERFINISHED
requires estimation of Fisher information matrix
updateRule theta_{k+1} = theta_k + alpha * F^{-1} * g
updateType first-order method in natural gradient space
usedWith linear policies
neural network policies
uses Fisher information matrix NERFINISHED
inverse Fisher matrix F^{-1}
policy gradient g

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

TRPO relatedTo Natural Policy Gradient