ACKTR
E98477
ACKTR (Actor-Critic using Kronecker-Factored Trust Region) is a reinforcement learning algorithm that combines actor-critic methods with efficient second-order optimization via Kronecker-factored approximations to improve training stability and sample efficiency.
Statements (37)
| Predicate | Object |
|---|---|
| instanceOf |
actor-critic algorithm
→
reinforcement learning algorithm → |
| abbreviationOf |
Actor-Critic using Kronecker-Factored Trust Region
NERFINISHED
→
|
| aimsToImprove |
sample efficiency
→
training stability → |
| approximates |
natural gradient
→
|
| basedOn |
actor-critic framework
→
trust region optimization → |
| category |
policy gradient method
→
value-based method → |
| combines |
policy gradient learning
→
value function estimation → |
| comparedWith |
A2C
→
A3C NERFINISHED → PPO NERFINISHED → TRPO NERFINISHED → |
| designedFor |
policy optimization
→
value function learning → |
| field |
deep reinforcement learning
→
|
| fullName |
Actor-Critic using Kronecker-Factored Trust Region
NERFINISHED
→
|
| hasProperty |
on-policy
→
sample efficient → stable training dynamics → |
| introducedAs |
efficient natural gradient actor-critic method
→
|
| objective |
maximize expected cumulative reward
→
|
| optimizationType |
second-order method
→
|
| usedIn |
Atari benchmarks
→
control tasks → |
| usesApproximation |
Kronecker-factored curvature matrix
→
|
| usesComponent |
actor network
→
critic network → |
| usesGradientInformation |
curvature-aware updates
→
|
| usesNaturalGradient |
true
→
|
| usesNeuralNetworks |
true
→
|
| usesOptimizationMethod |
Kronecker-factored approximation
→
second-order optimization → |
| usesTrustRegion |
true
→
|
Referenced by (3)
| Subject (surface form when different) | Predicate |
|---|---|
|
OpenAI Baselines
→
|
implementsAlgorithm |
|
A3C
→
|
inspiredAlgorithms |
|
Stable Baselines
→
|
supportsAlgorithm |