PPO
E98478
PPO (Proximal Policy Optimization) is a popular reinforcement learning algorithm known for its stability and sample efficiency in training complex policies, especially in continuous control and high-dimensional environments.
Statements (49)
| Predicate | Object |
|---|---|
| instanceOf |
reinforcement learning algorithm
→
|
| abbreviationFor |
Proximal Policy Optimization
NERFINISHED
→
|
| aimsFor |
sample efficiency
→
stable policy updates → |
| commonlyUsedIn |
MuJoCo control tasks
→
OpenAI Gym benchmarks → game playing → robotics control → |
| designedFor |
complex policies
→
continuous control tasks → high-dimensional environments → |
| developedBy |
OpenAI
NERFINISHED
→
|
| fullName |
Proximal Policy Optimization
NERFINISHED
→
|
| hasVariant |
PPO-Clip
NERFINISHED
→
PPO-Penalty → |
| implementedIn |
PyTorch RL libraries
→
RLlib NERFINISHED → Stable-Baselines3 NERFINISHED → TensorFlow Agents NERFINISHED → |
| improvesUpon |
TRPO
NERFINISHED
→
|
| introducedInPaper |
Proximal Policy Optimization Algorithms
NERFINISHED
→
|
| keyIdea |
approximates trust region methods without complex constraints
→
constrains policy updates to be proximal to the old policy → uses clipped surrogate objective → |
| objectiveIncludes |
entropy bonus (in many implementations)
→
|
| oftenCombinedWith |
GAE
NERFINISHED
→
advantage estimation → |
| optimizationType |
on-policy
→
policy gradient → |
| primaryAuthors |
Alec Radford
NERFINISHED
→
Filip Wolski NERFINISHED → John Schulman NERFINISHED → Oleg Klimov NERFINISHED → Prafulla Dhariwal NERFINISHED → |
| property |
relatively easy to implement
→
robust to hyperparameter choices → widely adopted as a default RL baseline → |
| publicationYear |
2017
→
|
| relatedTo |
A2C
NERFINISHED
→
A3C NERFINISHED → TRPO NERFINISHED → |
| supports |
continuous action spaces
→
discrete action spaces → |
| trainingStyle |
mini-batch updates
→
multiple epochs over collected trajectories → |
| uses |
clipping parameter epsilon
→
importance sampling ratio → stochastic gradient ascent → surrogate objective function → |
Referenced by (3)
| Subject (surface form when different) | Predicate |
|---|---|
|
OpenAI Baselines
→
|
implementsAlgorithm |
|
Stable Baselines
→
|
supportsAlgorithm |
|
TF-Agents
→
|
supportsAlgorithmFamily |