Prioritized Experience Replay DQN

E98475

Prioritized Experience Replay DQN is a variant of the Deep Q-Network algorithm that improves learning efficiency by sampling more informative experiences with higher priority from the replay buffer.

All labels observed (3)

How this entity was disambiguated

Statements (48)

Predicate Object
instanceOf Deep Q-Network variant
deep reinforcement learning algorithm
addresses inefficiency of uniform experience replay
learning from many uninformative transitions
aimsTo improve learning efficiency
improve sample efficiency
speed up convergence
applicationDomain Atari game playing
control tasks
basedOn Atari deep Q-network
surface form: Deep Q-Network
benefit can improve performance on Atari 2600 benchmarks
focuses updates on transitions with high learning potential
category value-based deep reinforcement learning
compatibleWith Double DQN
Dueling DQN
other off-policy value-based methods
coreIdea prioritize transitions with large temporal-difference error
sample more informative transitions with higher probability
evaluation outperforms baseline DQN with uniform replay on many games
extends uniform experience replay
field reinforcement learning
hasComponent importance sampling weight computation
priority update mechanism
priority-based sampling mechanism
hyperparameter alpha controls degree of prioritization
beta controls strength of importance sampling correction
influenced later prioritized replay methods in RL
introducedInPaper Prioritized Experience Replay DQN self-linksurface differs
surface form: Prioritized Experience Replay
learningSignal temporal-difference error magnitude
modifies sampling distribution over replay buffer
proposedBy David Silver
Ioannis Antonoglou
John Quan
Tom Schaul
publishedAt ICLR
surface form: International Conference on Learning Representations 2016
requires correction of sampling bias via importance sampling
storage of priorities alongside transitions in replay buffer
samplingStrategy proportional prioritization
rank-based prioritization
tradeOff focus on rare high-error transitions vs coverage of state space
trainingType off-policy learning
uses experience replay buffer
importance sampling exponent hyperparameter beta
importance sampling weights
neural network function approximator
priority exponent hyperparameter alpha
stochastic sampling from replay buffer
temporal-difference error as priority signal

How these facts were elicited

Referenced by (5)

Full triples — surface form annotated when it differs from this entity's canonical label.

OpenAI Baselines implementsAlgorithm Prioritized Experience Replay DQN
Atari deep Q-network inspiredAlgorithm Prioritized Experience Replay DQN
this entity surface form: Prioritized Experience Replay
Dueling DQN oftenCombinedWith Prioritized Experience Replay DQN
this entity surface form: Prioritized Experience Replay
Prioritized Experience Replay DQN introducedInPaper Prioritized Experience Replay DQN self-linksurface differs
this entity surface form: Prioritized Experience Replay
Rainbow DQN improvesOver Prioritized Experience Replay DQN
this entity surface form: Prioritized DQN