Dueling DQN
E98474
Deep Q-Network variant
deep reinforcement learning algorithm
value-based reinforcement learning method
Dueling DQN is a deep reinforcement learning algorithm that separates state-value and advantage estimations within its neural network architecture to improve learning efficiency and stability over standard DQN.
Statements (44)
| Predicate | Object |
|---|---|
| instanceOf |
Deep Q-Network variant
ⓘ
deep reinforcement learning algorithm ⓘ value-based reinforcement learning method ⓘ |
| actionSpaceType | discrete ⓘ |
| aimsToImprove |
learning efficiency
ⓘ
training stability ⓘ |
| basedOn | Q-learning ⓘ |
| citationVenue | Proceedings of the 33rd International Conference on Machine Learning NERFINISHED ⓘ |
| combinesStreamsToEstimate | Q-values ⓘ |
| commonlyEvaluatedOn | Atari 2600 games NERFINISHED ⓘ |
| controlType | off-policy ⓘ |
| domain | artificial intelligence ⓘ |
| especiallyHelpsWhen |
many actions have similar value
ⓘ
only a few actions affect the value of the state ⓘ |
| extends | Deep Q-Network NERFINISHED ⓘ |
| field | reinforcement learning ⓘ |
| hasComponent |
advantage stream
ⓘ
value stream ⓘ |
| hasFullName | Dueling Deep Q-Network NERFINISHED ⓘ |
| hasKeyIdea | decouple representation of state value from representation of advantages for each action ⓘ |
| implementedIn | DeepMind Atari agent ⓘ |
| improvesOver | standard DQN NERFINISHED ⓘ |
| influenced | Rainbow DQN NERFINISHED ⓘ |
| introducedBy |
Hado van Hasselt
NERFINISHED
ⓘ
Marc Lanctot NERFINISHED ⓘ Matteo Hessel NERFINISHED ⓘ Nando de Freitas NERFINISHED ⓘ Tom Schaul NERFINISHED ⓘ Ziyu Wang NERFINISHED ⓘ |
| introducedInPaper | Dueling Network Architectures for Deep Reinforcement Learning NERFINISHED ⓘ |
| learningParadigm | model-free ⓘ |
| normalizesAdvantageStream | by subtracting mean advantage ⓘ |
| oftenCombinedWith |
Double DQN
NERFINISHED
ⓘ
Prioritized Experience Replay NERFINISHED ⓘ |
| publishedAtConference | ICML 2016 NERFINISHED ⓘ |
| separatesEstimationOf |
advantage function
ⓘ
state-value function ⓘ |
| sharesFeatureExtractor | between value and advantage streams ⓘ |
| usesFunctionApproximator | deep neural network ⓘ |
| usesLossFunction | temporal-difference loss ⓘ |
| usesOptimizationMethod |
Adam optimizer
NERFINISHED
ⓘ
stochastic gradient descent ⓘ |
| usesTargetNetwork | yes ⓘ |
| yearIntroduced | 2016 ⓘ |
Referenced by (2)
Full triples — surface form annotated when it differs from this entity's canonical label.