Prioritized Experience Replay DQN

E98475

Deep Q-Network variant deep reinforcement learning algorithm

Prioritized Experience Replay DQN is a variant of the Deep Q-Network algorithm that improves learning efficiency by sampling more informative experiences with higher priority from the replay buffer.

Aliases (1)

Prioritized Experience Replay ×1

Statements (48)

Predicate	Object
instanceOf	Deep Q-Network variant → deep reinforcement learning algorithm →
addresses	inefficiency of uniform experience replay → learning from many uninformative transitions →
aimsTo	improve learning efficiency → improve sample efficiency → speed up convergence →
applicationDomain	Atari game playing → control tasks →
basedOn	Deep Q-Network NERFINISHED →
benefit	can improve performance on Atari 2600 benchmarks → focuses updates on transitions with high learning potential →
category	value-based deep reinforcement learning →
compatibleWith	Double DQN NERFINISHED → Dueling DQN NERFINISHED → other off-policy value-based methods →
coreIdea	prioritize transitions with large temporal-difference error → sample more informative transitions with higher probability →
evaluation	outperforms baseline DQN with uniform replay on many games →
extends	uniform experience replay →
field	reinforcement learning →
hasComponent	importance sampling weight computation → priority update mechanism → priority-based sampling mechanism →
hyperparameter	alpha controls degree of prioritization → beta controls strength of importance sampling correction →
influenced	later prioritized replay methods in RL →
introducedInPaper	Prioritized Experience Replay NERFINISHED →
learningSignal	temporal-difference error magnitude →
modifies	sampling distribution over replay buffer →
proposedBy	David Silver NERFINISHED → Ioannis Antonoglou NERFINISHED → John Quan NERFINISHED → Tom Schaul NERFINISHED →
publishedAt	International Conference on Learning Representations 2016 NERFINISHED →
requires	correction of sampling bias via importance sampling → storage of priorities alongside transitions in replay buffer →
samplingStrategy	proportional prioritization → rank-based prioritization →
tradeOff	focus on rare high-error transitions vs coverage of state space →
trainingType	off-policy learning →
uses	experience replay buffer → importance sampling exponent hyperparameter beta → importance sampling weights → neural network function approximator → priority exponent hyperparameter alpha → stochastic sampling from replay buffer → temporal-difference error as priority signal →

Referenced by (2)

Subject (surface form when different)	Predicate
OpenAI Baselines →	implementsAlgorithm
Atari deep Q-network ("Prioritized Experience Replay") →	inspiredAlgorithm