Hindsight Experience Replay
E98482
Hindsight Experience Replay is a reinforcement learning technique that improves sample efficiency by reinterpreting failed attempts as successful experiences toward alternative goals.
All labels observed (1)
| Label | Occurrences |
|---|---|
| Hindsight Experience Replay canonical | 2 |
Statements (45)
| Predicate | Object |
|---|---|
| instanceOf |
experience replay method
ⓘ
reinforcement learning technique ⓘ |
| abbreviation | HER ⓘ |
| aimsTo |
enable learning from sparse rewards
ⓘ
improve sample efficiency ⓘ reuse failed trajectories as successful ones for alternative goals ⓘ |
| appliedTo |
multi-goal environments
ⓘ
robotic manipulation tasks ⓘ sparse reward environments ⓘ |
| assumes | goals can be derived from achieved states ⓘ |
| category | off-policy data augmentation technique ⓘ |
| citationCountCategory | highly cited reinforcement learning method ⓘ |
| compatibleWith |
DDPG
ⓘ
surface form:
Deep Deterministic Policy Gradient
Deep Q-Learning ⓘ actor-critic methods ⓘ |
| coreIdea | reinterpret failed attempts as successful experiences toward different goals ⓘ |
| field |
machine learning
ⓘ
reinforcement learning ⓘ |
| implementedIn |
OpenAI Baselines
ⓘ
Stable Baselines ⓘ |
| improves |
data efficiency of reinforcement learning agents
ⓘ
learning speed in sparse reward settings ⓘ |
| influenced |
Goal-Conditioned HER variants
ⓘ
Hindsight Policy Gradients ⓘ multi-goal RL benchmarks such as Fetch environments ⓘ |
| introducedInPaper | Hindsight Experience Replay self-link ⓘ |
| keyMechanism | relabelling goals in stored trajectories ⓘ |
| modifies | replay buffer sampling strategy ⓘ |
| operatesOn | goal-conditioned policies ⓘ |
| proposedBy |
Alex Ray
ⓘ
Bob McGrew ⓘ Filip Wolski ⓘ Jonas Schneider ⓘ Josh Tobin ⓘ Marcin Andrychowicz ⓘ OpenAI researchers ⓘ Peter Welinder ⓘ Rachel Fong ⓘ |
| publicationYear | 2017 ⓘ |
| publishedAtConference |
NeurIPS
ⓘ
surface form:
NeurIPS 2017
|
| relatedTo |
Universal Value Function Approximators
ⓘ
goal-conditioned reinforcement learning ⓘ |
| requires | goal representation in state space ⓘ |
| uses |
experience replay buffer
ⓘ
off-policy reinforcement learning ⓘ |
Referenced by (2)
Full triples — surface form annotated when it differs from this entity's canonical label.