Actor-Critic using Kronecker-Factored Trust Region
E441103
Actor-Critic using Kronecker-Factored Trust Region (ACKTR) is a reinforcement learning algorithm that improves sample efficiency and stability by applying Kronecker-factored approximate curvature to natural gradient updates in actor-critic methods.
Statements (46)
| Predicate | Object |
|---|---|
| instanceOf |
actor-critic method
ⓘ
policy gradient method ⓘ reinforcement learning algorithm ⓘ |
| abbreviation | ACKTR NERFINISHED ⓘ |
| aimsTo |
improve sample efficiency
ⓘ
improve training stability ⓘ |
| appliedTo |
policy parameters
ⓘ
value function parameters ⓘ |
| approximates | Fisher information matrix ⓘ |
| basedOn | trust region optimization ⓘ |
| category |
deep learning optimization method
ⓘ
second-order reinforcement learning method ⓘ |
| comparedWith |
A2C in original paper
ⓘ
TRPO in original paper ⓘ |
| constrains | policy update step size via trust region ⓘ |
| designedFor | deep reinforcement learning ⓘ |
| evaluatedOn |
Atari 2600 benchmark
ⓘ
MuJoCo continuous control tasks ⓘ |
| implementedIn | TensorFlow in original code release ⓘ |
| improves |
data efficiency compared to first-order methods
ⓘ
stability compared to vanilla policy gradient ⓘ |
| introducedBy |
Elman Mansimov
NERFINISHED
ⓘ
Jimmy Ba NERFINISHED ⓘ Roger B. Grosse NERFINISHED ⓘ Shun Liao NERFINISHED ⓘ Yuhuai Wu NERFINISHED ⓘ |
| introducedIn | paper "Scalable Trust-Region Method for Deep Reinforcement Learning Using Kronecker-Factored Approximation" NERFINISHED ⓘ |
| openSource | true ⓘ |
| optimizes |
actor network
ⓘ
critic network ⓘ |
| publishedAt | ICLR 2017 NERFINISHED ⓘ |
| relatedTo |
A2C
NERFINISHED
ⓘ
A3C NERFINISHED ⓘ TRPO NERFINISHED ⓘ Trust Region Policy Optimization NERFINISHED ⓘ natural policy gradient ⓘ |
| targets | maximization of expected return ⓘ |
| uses |
Kronecker-factored approximate curvature
ⓘ
Kronecker-factored approximation of curvature ⓘ actor-critic architecture ⓘ advantage estimates ⓘ mini-batch updates ⓘ natural gradient ⓘ on-policy learning ⓘ second-order optimization information ⓘ stochastic gradient estimates ⓘ |
Referenced by (2)
Full triples — surface form annotated when it differs from this entity's canonical label.