Prioritized Experience Replay DQN
E98475
Prioritized Experience Replay DQN is a variant of the Deep Q-Network algorithm that improves learning efficiency by sampling more informative experiences with higher priority from the replay buffer.
Aliases (1)
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
Deep Q-Network variant
→
deep reinforcement learning algorithm → |
| addresses |
inefficiency of uniform experience replay
→
learning from many uninformative transitions → |
| aimsTo |
improve learning efficiency
→
improve sample efficiency → speed up convergence → |
| applicationDomain |
Atari game playing
→
control tasks → |
| basedOn |
Deep Q-Network
NERFINISHED
→
|
| benefit |
can improve performance on Atari 2600 benchmarks
→
focuses updates on transitions with high learning potential → |
| category |
value-based deep reinforcement learning
→
|
| compatibleWith |
Double DQN
NERFINISHED
→
Dueling DQN NERFINISHED → other off-policy value-based methods → |
| coreIdea |
prioritize transitions with large temporal-difference error
→
sample more informative transitions with higher probability → |
| evaluation |
outperforms baseline DQN with uniform replay on many games
→
|
| extends |
uniform experience replay
→
|
| field |
reinforcement learning
→
|
| hasComponent |
importance sampling weight computation
→
priority update mechanism → priority-based sampling mechanism → |
| hyperparameter |
alpha controls degree of prioritization
→
beta controls strength of importance sampling correction → |
| influenced |
later prioritized replay methods in RL
→
|
| introducedInPaper |
Prioritized Experience Replay
NERFINISHED
→
|
| learningSignal |
temporal-difference error magnitude
→
|
| modifies |
sampling distribution over replay buffer
→
|
| proposedBy |
David Silver
NERFINISHED
→
Ioannis Antonoglou NERFINISHED → John Quan NERFINISHED → Tom Schaul NERFINISHED → |
| publishedAt |
International Conference on Learning Representations 2016
NERFINISHED
→
|
| requires |
correction of sampling bias via importance sampling
→
storage of priorities alongside transitions in replay buffer → |
| samplingStrategy |
proportional prioritization
→
rank-based prioritization → |
| tradeOff |
focus on rare high-error transitions vs coverage of state space
→
|
| trainingType |
off-policy learning
→
|
| uses |
experience replay buffer
→
importance sampling exponent hyperparameter beta → importance sampling weights → neural network function approximator → priority exponent hyperparameter alpha → stochastic sampling from replay buffer → temporal-difference error as priority signal → |
Referenced by (2)
| Subject (surface form when different) | Predicate |
|---|---|
|
OpenAI Baselines
→
|
implementsAlgorithm |
|
Atari deep Q-network
("Prioritized Experience Replay")
→
|
inspiredAlgorithm |