A3C

E99656

reinforcement learning algorithm

A3C (Asynchronous Advantage Actor-Critic) is a reinforcement learning algorithm that trains multiple parallel agents to learn policies and value functions efficiently using asynchronous gradient updates.

Statements (51)

Predicate	Object
instanceOf	reinforcement learning algorithm →
abbreviationOf	Asynchronous Advantage Actor-Critic →
canUseNetworkType	convolutional neural networks → recurrent neural networks →
comparedWith	DQN →
developedAtOrganization	DeepMind →
fullName	Asynchronous Advantage Actor-Critic →
handlesInputType	high-dimensional sensory input → raw pixel observations →
hasLearningParadigm	model-free reinforcement learning →
hasLearningType	actor-critic method → policy gradient method →
hasProperty	does not require experience replay → efficient use of multi-core CPUs → improves training stability via parallelism →
inspiredAlgorithms	A2C → ACKTR → IMPALA →
introducedBy	Adrià Puigdomènech Badia → Alex Graves → David Silver → Koray Kavukcuoglu → Mehdi Mirza → Tim Harley → Timothy P. Lillicrap NERFINISHED → Volodymyr Mnih →
introducedInPaper	Asynchronous Methods for Deep Reinforcement Learning →
introducedInYear	2016 →
isOnPolicy	true →
optimizationObjective	maximize expected cumulative reward →
optimizationStyle	asynchronous gradient descent →
reducesVariance	policy gradient estimates →
supportsParallelism	true →
targetDomain	Atari 2600 games → continuous control problems → control tasks →
trainingSignalType	bootstrapped returns →
usesArchitecture	actor-critic architecture →
usesBaseline	state-value function →
usesComponent	advantage function → policy network → value network →
usesExplorationMethod	on-policy exploration →
usesLossComponent	entropy regularization → policy loss → value loss →
usesNeuralNetworks	deep neural networks →
usesParallelAgents	multiple parallel workers →
usesSignal	advantage estimate →
usesTrainingMode	asynchronous training →
usesUpdateScheme	asynchronous gradient updates →

Referenced by (1)

Subject (surface form when different)	Predicate
OpenAI Baselines →	implementsAlgorithm