Double DQN

E101969

Double DQN is a reinforcement learning algorithm that improves upon standard Deep Q-Networks by reducing overestimation bias through decoupling action selection from action evaluation.

All labels observed (3)

How this entity was disambiguated

Statements (48)

Predicate Object
instanceOf Deep Q-Learning variant
model-free algorithm
off-policy algorithm
reinforcement learning algorithm
value-based reinforcement learning method
addressesProblem overestimation of action values in DQN
alsoKnownAs Double DQN
surface form: Double Deep Q-Network
basedOn Q-learning
category deep reinforcement learning
commonlyUses epsilon-greedy exploration
experience replay
target network with delayed updates
empiricalResult often achieves higher scores than DQN on Atari benchmarks
reduces overestimation of Q-values compared to DQN
evaluationDomain Atari 2600
surface form: Atari 2600 games
extends Deep Q-Learning
surface form: Deep Q-Network
frameworkSupport implemented in many deep RL libraries
implementationDetail shares architecture with DQN but changes target calculation
improvesUpon Deep Q-Network performance stability
Deep Q-Network value estimation accuracy
influenced Dueling DQN
surface form: Dueling Double DQN

Rainbow DQN
inspiredBy Q-learning
surface form: Double Q-learning
introducedBy Arthur Guez
David Silver
Hado van Hasselt
keyIdea decouple action selection from action evaluation
learningType temporal-difference learning
modifies target value computation of DQN
networkType deep neural network approximator
notableProperty maintains same computational complexity as DQN up to constant factors
optimizationMethod stochastic gradient descent or variants
policyType greedy policy w.r.t. learned Q-values
primaryGoal reduce overestimation bias in Q-learning
publicationYear 2015
publishedIn Double DQN self-linksurface differs
surface form: paper "Deep Reinforcement Learning with Double Q-learning"
reduces positive bias in max operator over noisy value estimates
requires discrete action space
targetComputation uses argmax over online network Q-values to select action
uses target network Q-value of selected action for evaluation
trainingMode batch updates from replay buffer
updateRule uses separate networks in target for action selection and evaluation
usedIn control tasks in simulated environments
game-playing agents
uses online network for action selection
target network for action evaluation
two value estimates for action evaluation
valueFunction approximates state-action value function Q(s,a)

How these facts were elicited

Referenced by (7)

Full triples — surface form annotated when it differs from this entity's canonical label.

Dueling DQN oftenCombinedWith Double DQN
Double DQN alsoKnownAs Double DQN
this entity surface form: Double Deep Q-Network
Double DQN publishedIn Double DQN self-linksurface differs
this entity surface form: paper "Deep Reinforcement Learning with Double Q-learning"
Rainbow DQN improvesOver Double DQN