Natural Policy Gradient
E441106
Natural Policy Gradient is a reinforcement learning optimization method that improves policy gradient updates by accounting for the geometry of the parameter space using the Fisher information matrix, leading to more stable and efficient learning.
Statements (46)
| Predicate | Object |
|---|---|
| instanceOf |
optimization method
ⓘ
policy gradient method ⓘ reinforcement learning algorithm ⓘ |
| aimsFor |
better-conditioned updates
ⓘ
more sample-efficient learning ⓘ more stable learning ⓘ |
| appliedIn |
continuous control
ⓘ
episodic reinforcement learning ⓘ robotics ⓘ |
| assumes | differentiable policy parameterization ⓘ |
| basedOn | natural gradient ⓘ |
| canBeApproximatedBy |
Kronecker-factored approximations
ⓘ
conjugate gradient methods ⓘ |
| canUse | compatible function approximation ⓘ |
| challenge | computational cost of Fisher matrix inversion ⓘ |
| convergenceProperty | often more robust than vanilla policy gradient ⓘ |
| describedIn | A Natural Policy Gradient NERFINISHED ⓘ |
| estimationMethod |
Monte Carlo sampling
NERFINISHED
ⓘ
likelihood ratio gradient estimator ⓘ |
| field |
machine learning
ⓘ
reinforcement learning ⓘ |
| goal | maximize expected return ⓘ |
| improvesOver | standard policy gradient ⓘ |
| inspired |
Truncated Natural Policy Gradient
NERFINISHED
ⓘ
Trust Region Policy Optimization NERFINISHED ⓘ |
| introducedBy | Sham Kakade NERFINISHED ⓘ |
| mathematicalFoundation |
Riemannian optimization
ⓘ
information geometry ⓘ |
| objectiveType | on-policy objective ⓘ |
| optimizes |
parameterized policy
ⓘ
stochastic policy ⓘ |
| property |
accounts for geometry of parameter space
ⓘ
invariant to smooth reparameterizations ⓘ uses Riemannian metric induced by Fisher information ⓘ |
| publicationYear | 2001 ⓘ |
| relatedTo |
Actor-Critic methods
ⓘ
Proximal Policy Optimization NERFINISHED ⓘ Trust Region Policy Optimization NERFINISHED ⓘ |
| requires | estimation of Fisher information matrix ⓘ |
| updateRule | theta_{k+1} = theta_k + alpha * F^{-1} * g ⓘ |
| updateType | first-order method in natural gradient space ⓘ |
| usedWith |
linear policies
ⓘ
neural network policies ⓘ |
| uses |
Fisher information matrix
NERFINISHED
ⓘ
inverse Fisher matrix F^{-1} ⓘ policy gradient g ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.