A3C
E99656
A3C (Asynchronous Advantage Actor-Critic) is a reinforcement learning algorithm that trains multiple parallel agents to learn policies and value functions efficiently using asynchronous gradient updates.
All labels observed (1)
| Label | Occurrences |
|---|---|
| A3C canonical | 4 |
Statements (51)
| Predicate | Object |
|---|---|
| instanceOf | reinforcement learning algorithm ⓘ |
| abbreviationOf | Asynchronous Advantage Actor-Critic ⓘ |
| canUseNetworkType |
convolutional neural networks
ⓘ
recurrent neural networks ⓘ |
| comparedWith |
Atari deep Q-network
ⓘ
surface form:
DQN
|
| developedAtOrganization | DeepMind ⓘ |
| fullName | Asynchronous Advantage Actor-Critic ⓘ |
| handlesInputType |
high-dimensional sensory input
ⓘ
raw pixel observations ⓘ |
| hasLearningParadigm | model-free reinforcement learning ⓘ |
| hasLearningType |
actor-critic method
ⓘ
policy gradient method ⓘ |
| hasProperty |
does not require experience replay
ⓘ
efficient use of multi-core CPUs ⓘ improves training stability via parallelism ⓘ |
| inspiredAlgorithms |
A2C
ⓘ
ACKTR ⓘ IMPALA ⓘ |
| introducedBy |
Adrià Puigdomènech Badia
ⓘ
Alex Graves ⓘ David Silver ⓘ Koray Kavukcuoglu ⓘ Mehdi Mirza ⓘ Tim Harley ⓘ Timothy P. Lillicrap ⓘ Volodymyr Mnih ⓘ |
| introducedInPaper | Asynchronous Methods for Deep Reinforcement Learning ⓘ |
| introducedInYear | 2016 ⓘ |
| isOnPolicy | true ⓘ |
| optimizationObjective | maximize expected cumulative reward ⓘ |
| optimizationStyle | asynchronous gradient descent ⓘ |
| reducesVariance | policy gradient estimates ⓘ |
| supportsParallelism | true ⓘ |
| targetDomain |
Atari 2600
ⓘ
surface form:
Atari 2600 games
continuous control problems ⓘ control tasks ⓘ |
| trainingSignalType | bootstrapped returns ⓘ |
| usesArchitecture | actor-critic architecture ⓘ |
| usesBaseline | state-value function ⓘ |
| usesComponent |
advantage function
ⓘ
policy network ⓘ value network ⓘ |
| usesExplorationMethod | on-policy exploration ⓘ |
| usesLossComponent |
entropy regularization
ⓘ
policy loss ⓘ value loss ⓘ |
| usesNeuralNetworks | deep neural networks ⓘ |
| usesParallelAgents | multiple parallel workers ⓘ |
| usesSignal | advantage estimate ⓘ |
| usesTrainingMode | asynchronous training ⓘ |
| usesUpdateScheme | asynchronous gradient updates ⓘ |
Referenced by (4)
Full triples — surface form annotated when it differs from this entity's canonical label.