A2C

E98476

A2C (Advantage Actor-Critic) is a popular synchronous policy gradient reinforcement learning algorithm that combines value-based and policy-based methods to improve training stability and efficiency.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (1)

Label Occurrences
A2C canonical 4

Statements (48)

Predicate Object
instanceOf actor-critic method
policy gradient method
reinforcement learning algorithm
actorOutputs action probabilities
actorUpdatedWith advantage-weighted log-probabilities
advantageDefinition A(s,a) = Q(s,a) - V(s)
canHandle continuous observation spaces
discrete action spaces
high-dimensional state spaces
canUse multiple parallel environments
category deep reinforcement learning
combines policy-based methods
value-based methods
criticOutputs state-value estimate
criticTrainedWith regression to returns or bootstrapped targets
entropyBonusPurpose encourage exploration
fullName Asynchronous Advantage Actor-Critic
surface form: Advantage Actor-Critic
goal improve sample efficiency
improve training stability
reduce gradient variance
implementedIn OpenAI Baselines
PyTorch-based RL libraries
Stable Baselines
Stable Baselines
surface form: Stable Baselines3

TensorFlow-based RL libraries
isOnPolicy true
isPolicyBased true
isRelatedTo A3C
isSynchronous true
isSynchronousVariantOf A3C
isValueBased true
optimizes stochastic policy
reducesVarianceUsing advantage estimation
value function baseline
trainingSignal temporal-difference error
typicalUseCase Atari game playing
continuous control tasks
discrete action tasks
updateFrequency multiple environment steps per update
usesAdvantageFunction true
usesBaseline state-value function
usesFunctionApproximator neural network
usesLearningParadigm model-free reinforcement learning
usesLossComponent entropy regularization
policy loss
value loss
usesObjective policy gradient objective
usesUpdateType synchronous gradient updates

Referenced by (4)

Full triples — surface form annotated when it differs from this entity's canonical label.