Hindsight Experience Replay
E98482
Hindsight Experience Replay is a reinforcement learning technique that improves sample efficiency by reinterpreting failed attempts as successful experiences toward alternative goals.
Statements (45)
| Predicate | Object |
|---|---|
| instanceOf |
experience replay method
ⓘ
reinforcement learning technique ⓘ |
| abbreviation | HER NERFINISHED ⓘ |
| aimsTo |
enable learning from sparse rewards
ⓘ
improve sample efficiency ⓘ reuse failed trajectories as successful ones for alternative goals ⓘ |
| appliedTo |
multi-goal environments
ⓘ
robotic manipulation tasks ⓘ sparse reward environments ⓘ |
| assumes | goals can be derived from achieved states ⓘ |
| category | off-policy data augmentation technique ⓘ |
| citationCountCategory | highly cited reinforcement learning method ⓘ |
| compatibleWith |
Deep Deterministic Policy Gradient
NERFINISHED
ⓘ
Deep Q-Learning NERFINISHED ⓘ actor-critic methods ⓘ |
| coreIdea | reinterpret failed attempts as successful experiences toward different goals ⓘ |
| field |
machine learning
ⓘ
reinforcement learning ⓘ |
| implementedIn |
OpenAI Baselines
NERFINISHED
ⓘ
Stable Baselines NERFINISHED ⓘ |
| improves |
data efficiency of reinforcement learning agents
ⓘ
learning speed in sparse reward settings ⓘ |
| influenced |
Goal-Conditioned HER variants
ⓘ
Hindsight Policy Gradients NERFINISHED ⓘ multi-goal RL benchmarks such as Fetch environments ⓘ |
| introducedInPaper | Hindsight Experience Replay NERFINISHED ⓘ |
| keyMechanism | relabelling goals in stored trajectories ⓘ |
| modifies | replay buffer sampling strategy ⓘ |
| operatesOn | goal-conditioned policies ⓘ |
| proposedBy |
Alex Ray
NERFINISHED
ⓘ
Bob McGrew NERFINISHED ⓘ Filip Wolski NERFINISHED ⓘ Jonas Schneider NERFINISHED ⓘ Josh Tobin NERFINISHED ⓘ Marcin Andrychowicz NERFINISHED ⓘ OpenAI researchers ⓘ Peter Welinder NERFINISHED ⓘ Rachel Fong NERFINISHED ⓘ |
| publicationYear | 2017 ⓘ |
| publishedAtConference | NeurIPS 2017 NERFINISHED ⓘ |
| relatedTo |
Universal Value Function Approximators
NERFINISHED
ⓘ
goal-conditioned reinforcement learning ⓘ |
| requires | goal representation in state space ⓘ |
| uses |
experience replay buffer
ⓘ
off-policy reinforcement learning ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.