A3C

E99656

A3C (Asynchronous Advantage Actor-Critic) is a reinforcement learning algorithm that trains multiple parallel agents to learn policies and value functions efficiently using asynchronous gradient updates.


Statements (51)
Predicate Object
instanceOf reinforcement learning algorithm
abbreviationOf Asynchronous Advantage Actor-Critic
canUseNetworkType convolutional neural networks
recurrent neural networks
comparedWith DQN
developedAtOrganization DeepMind
fullName Asynchronous Advantage Actor-Critic
handlesInputType high-dimensional sensory input
raw pixel observations
hasLearningParadigm model-free reinforcement learning
hasLearningType actor-critic method
policy gradient method
hasProperty does not require experience replay
efficient use of multi-core CPUs
improves training stability via parallelism
inspiredAlgorithms A2C
ACKTR
IMPALA
introducedBy Adrià Puigdomènech Badia
Alex Graves
David Silver
Koray Kavukcuoglu
Mehdi Mirza
Tim Harley
Timothy P. Lillicrap NERFINISHED
Volodymyr Mnih
introducedInPaper Asynchronous Methods for Deep Reinforcement Learning
introducedInYear 2016
isOnPolicy true
optimizationObjective maximize expected cumulative reward
optimizationStyle asynchronous gradient descent
reducesVariance policy gradient estimates
supportsParallelism true
targetDomain Atari 2600 games
continuous control problems
control tasks
trainingSignalType bootstrapped returns
usesArchitecture actor-critic architecture
usesBaseline state-value function
usesComponent advantage function
policy network
value network
usesExplorationMethod on-policy exploration
usesLossComponent entropy regularization
policy loss
value loss
usesNeuralNetworks deep neural networks
usesParallelAgents multiple parallel workers
usesSignal advantage estimate
usesTrainingMode asynchronous training
usesUpdateScheme asynchronous gradient updates

Referenced by (1)
Subject (surface form when different) Predicate
OpenAI Baselines
implementsAlgorithm

Please wait…