PPO2
E98479
PPO2 is an improved variant of the Proximal Policy Optimization reinforcement learning algorithm, designed for stable and efficient policy gradient training in continuous and discrete control tasks.
Statements (47)
| Predicate | Object |
|---|---|
| instanceOf |
policy gradient method
→
reinforcement learning algorithm → |
| abbreviationOf |
Proximal Policy Optimization 2
NERFINISHED
→
|
| aimsTo |
improve sample efficiency
→
improve training stability → |
| avoids |
second-order optimization used in TRPO
→
|
| basedOn |
Proximal Policy Optimization
NERFINISHED
→
|
| commonlyUsedFor |
benchmark continuous control tasks
→
game-playing agents → robotics control tasks → |
| commonlyUsedWith |
OpenAI Gym environments
NERFINISHED
→
|
| contrastsWith |
Trust Region Policy Optimization
NERFINISHED
→
|
| controls |
policy update step size via clipping parameter
→
|
| designedFor |
continuous control tasks
→
discrete control tasks → efficient policy gradient training → stable policy gradient training → |
| goal |
balance exploration and exploitation
→
prevent destructive policy updates → |
| hasFeature |
clipped value function loss
→
entropy regularization → mini-batch stochastic gradient descent → multiple epochs over the same batch of data → separate policy and value networks → value function baseline → |
| hasHyperparameter |
GAE lambda
→
clip range → discount factor gamma → entropy coefficient → learning rate → mini-batch size → number of epochs → value function coefficient → |
| improvesUpon |
original PPO implementation details
→
|
| isImplementedIn |
Stable-Baselines
NERFINISHED
→
Stable-Baselines3 (as PPO successor, conceptually similar) NERFINISHED → |
| isVariantOf |
PPO
NERFINISHED
→
|
| optimizes |
stochastic policies
→
|
| supports |
on-policy learning
→
|
| supportsActionSpaces |
continuous action spaces
→
discrete action spaces → |
| trainingType |
actor-critic
→
|
| updateType |
first-order optimization
→
|
| uses |
advantage estimation
→
clipped surrogate objective → generalized advantage estimation → gradient-based optimization → |
Referenced by (1)
| Subject (surface form when different) | Predicate |
|---|---|
|
OpenAI Baselines
→
|
implementsAlgorithm |