A3C
E99656
A3C (Asynchronous Advantage Actor-Critic) is a reinforcement learning algorithm that trains multiple parallel agents to learn policies and value functions efficiently using asynchronous gradient updates.
Statements (51)
| Predicate | Object |
|---|---|
| instanceOf |
reinforcement learning algorithm
→
|
| abbreviationOf |
Asynchronous Advantage Actor-Critic
→
|
| canUseNetworkType |
convolutional neural networks
→
recurrent neural networks → |
| comparedWith |
DQN
→
|
| developedAtOrganization |
DeepMind
→
|
| fullName |
Asynchronous Advantage Actor-Critic
→
|
| handlesInputType |
high-dimensional sensory input
→
raw pixel observations → |
| hasLearningParadigm |
model-free reinforcement learning
→
|
| hasLearningType |
actor-critic method
→
policy gradient method → |
| hasProperty |
does not require experience replay
→
efficient use of multi-core CPUs → improves training stability via parallelism → |
| inspiredAlgorithms |
A2C
→
ACKTR → IMPALA → |
| introducedBy |
Adrià Puigdomènech Badia
→
Alex Graves → David Silver → Koray Kavukcuoglu → Mehdi Mirza → Tim Harley → Timothy P. Lillicrap NERFINISHED → Volodymyr Mnih → |
| introducedInPaper |
Asynchronous Methods for Deep Reinforcement Learning
→
|
| introducedInYear |
2016
→
|
| isOnPolicy |
true
→
|
| optimizationObjective |
maximize expected cumulative reward
→
|
| optimizationStyle |
asynchronous gradient descent
→
|
| reducesVariance |
policy gradient estimates
→
|
| supportsParallelism |
true
→
|
| targetDomain |
Atari 2600 games
→
continuous control problems → control tasks → |
| trainingSignalType |
bootstrapped returns
→
|
| usesArchitecture |
actor-critic architecture
→
|
| usesBaseline |
state-value function
→
|
| usesComponent |
advantage function
→
policy network → value network → |
| usesExplorationMethod |
on-policy exploration
→
|
| usesLossComponent |
entropy regularization
→
policy loss → value loss → |
| usesNeuralNetworks |
deep neural networks
→
|
| usesParallelAgents |
multiple parallel workers
→
|
| usesSignal |
advantage estimate
→
|
| usesTrainingMode |
asynchronous training
→
|
| usesUpdateScheme |
asynchronous gradient updates
→
|
Referenced by (1)
| Subject (surface form when different) | Predicate |
|---|---|
|
OpenAI Baselines
→
|
implementsAlgorithm |