Hindsight Policy Gradients
E441117
goal-conditioned reinforcement learning method
policy gradient method
reinforcement learning algorithm
Hindsight Policy Gradients is a reinforcement learning algorithm that extends policy gradient methods by retrospectively reinterpreting failed trajectories as successes for alternative goals, improving learning efficiency in sparse-reward environments.
All labels observed (1)
| Label | Occurrences |
|---|---|
| Hindsight Policy Gradients canonical | 1 |
Statements (49)
| Predicate | Object |
|---|---|
| instanceOf |
goal-conditioned reinforcement learning method
ⓘ
policy gradient method ⓘ reinforcement learning algorithm ⓘ |
| addressesProblem |
sample inefficiency in policy gradient methods
ⓘ
sparse reward reinforcement learning ⓘ |
| appliedTo |
navigation tasks
ⓘ
robotic manipulation tasks ⓘ |
| arXivId | arXiv:1805. hindsight-policy-gradients (approximate, not exact id) ⓘ |
| category | model-free reinforcement learning ⓘ |
| comparedWith |
actor-critic methods without hindsight
ⓘ
standard REINFORCE ⓘ |
| evaluationMetric |
final task success rate
ⓘ
learning speed in sparse reward settings ⓘ |
| extends |
REINFORCE algorithm
NERFINISHED
ⓘ
standard policy gradient methods ⓘ |
| improves | sample efficiency of policy gradient methods ⓘ |
| introducedBy |
Alex Ray
NERFINISHED
ⓘ
Bob McGrew NERFINISHED ⓘ Filip Wolski NERFINISHED ⓘ Jonas Schneider NERFINISHED ⓘ Josh Tobin NERFINISHED ⓘ Marcin Andrychowicz NERFINISHED ⓘ OpenAI researchers ⓘ Peter Welinder NERFINISHED ⓘ Rachel Fong NERFINISHED ⓘ |
| introducedInPaper | Hindsight Policy Gradients NERFINISHED ⓘ |
| keyIdea |
derive unbiased policy gradient estimators with hindsight goals
ⓘ
reinterpret failed trajectories as successful for alternative goals ⓘ use hindsight to construct additional learning signals ⓘ |
| operatesOn |
continuous control tasks
ⓘ
goal-conditioned Markov decision processes ⓘ sparse reward environments ⓘ |
| optimizationTarget | expected return over goals ⓘ |
| provides | unbiased gradient estimator under certain assumptions ⓘ |
| publishedAs | arXiv preprint ⓘ |
| relatedTo |
Hindsight Experience Replay
NERFINISHED
ⓘ
goal-conditioned policies ⓘ off-policy reinforcement learning ⓘ on-policy reinforcement learning ⓘ |
| requires |
a goal-conditioned reward function
ⓘ
access to achieved goals along a trajectory ⓘ |
| supports |
continuous action spaces
ⓘ
high-dimensional state spaces ⓘ |
| uses | importance sampling ratios for goal relabeling ⓘ |
| usesConcept |
goal relabeling
ⓘ
hindsight ⓘ importance sampling ⓘ policy gradients ⓘ |
| yearIntroduced | 2018 ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.