Prioritized Experience Replay DQN

E98475

Prioritized Experience Replay DQN is a variant of the Deep Q-Network algorithm that improves learning efficiency by sampling more informative experiences with higher priority from the replay buffer.


Statements (48)
Predicate Object
instanceOf Deep Q-Network variant
deep reinforcement learning algorithm
addresses inefficiency of uniform experience replay
learning from many uninformative transitions
aimsTo improve learning efficiency
improve sample efficiency
speed up convergence
applicationDomain Atari game playing
control tasks
basedOn Deep Q-Network NERFINISHED
benefit can improve performance on Atari 2600 benchmarks
focuses updates on transitions with high learning potential
category value-based deep reinforcement learning
compatibleWith Double DQN NERFINISHED
Dueling DQN NERFINISHED
other off-policy value-based methods
coreIdea prioritize transitions with large temporal-difference error
sample more informative transitions with higher probability
evaluation outperforms baseline DQN with uniform replay on many games
extends uniform experience replay
field reinforcement learning
hasComponent importance sampling weight computation
priority update mechanism
priority-based sampling mechanism
hyperparameter alpha controls degree of prioritization
beta controls strength of importance sampling correction
influenced later prioritized replay methods in RL
introducedInPaper Prioritized Experience Replay NERFINISHED
learningSignal temporal-difference error magnitude
modifies sampling distribution over replay buffer
proposedBy David Silver NERFINISHED
Ioannis Antonoglou NERFINISHED
John Quan NERFINISHED
Tom Schaul NERFINISHED
publishedAt International Conference on Learning Representations 2016 NERFINISHED
requires correction of sampling bias via importance sampling
storage of priorities alongside transitions in replay buffer
samplingStrategy proportional prioritization
rank-based prioritization
tradeOff focus on rare high-error transitions vs coverage of state space
trainingType off-policy learning
uses experience replay buffer
importance sampling exponent hyperparameter beta
importance sampling weights
neural network function approximator
priority exponent hyperparameter alpha
stochastic sampling from replay buffer
temporal-difference error as priority signal

Referenced by (2)
Subject (surface form when different) Predicate
OpenAI Baselines
implementsAlgorithm
Atari deep Q-network ("Prioritized Experience Replay")
inspiredAlgorithm

Please wait…