A2C

E98476

A2C (Advantage Actor-Critic) is a popular synchronous policy gradient reinforcement learning algorithm that combines value-based and policy-based methods to improve training stability and efficiency.


Statements (48)
Predicate Object
instanceOf actor-critic method
policy gradient method
reinforcement learning algorithm
actorOutputs action probabilities
actorUpdatedWith advantage-weighted log-probabilities
advantageDefinition A(s,a) = Q(s,a) - V(s)
canHandle continuous observation spaces
discrete action spaces
high-dimensional state spaces
canUse multiple parallel environments
category deep reinforcement learning
combines policy-based methods
value-based methods
criticOutputs state-value estimate
criticTrainedWith regression to returns or bootstrapped targets
entropyBonusPurpose encourage exploration
fullName Advantage Actor-Critic NERFINISHED
goal improve sample efficiency
improve training stability
reduce gradient variance
implementedIn OpenAI Baselines NERFINISHED
PyTorch-based RL libraries
Stable Baselines NERFINISHED
Stable Baselines3 NERFINISHED
TensorFlow-based RL libraries
isOnPolicy true
isPolicyBased true
isRelatedTo A3C
isSynchronous true
isSynchronousVariantOf A3C
isValueBased true
optimizes stochastic policy
reducesVarianceUsing advantage estimation
value function baseline
trainingSignal temporal-difference error
typicalUseCase Atari game playing
continuous control tasks
discrete action tasks
updateFrequency multiple environment steps per update
usesAdvantageFunction true
usesBaseline state-value function
usesFunctionApproximator neural network
usesLearningParadigm model-free reinforcement learning
usesLossComponent entropy regularization
policy loss
value loss
usesObjective policy gradient objective
usesUpdateType synchronous gradient updates

Referenced by (2)
Subject (surface form when different) Predicate
OpenAI Baselines
implementsAlgorithm
Stable Baselines
supportsAlgorithm

Please wait…