PPO2

E98479

PPO2 is an improved variant of the Proximal Policy Optimization reinforcement learning algorithm, designed for stable and efficient policy gradient training in continuous and discrete control tasks.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (1)

Label Occurrences
PPO2 canonical 1

Statements (47)

Predicate Object
instanceOf policy gradient method
reinforcement learning algorithm
abbreviationOf Proximal Policy Optimization
surface form: Proximal Policy Optimization 2
aimsTo improve sample efficiency
improve training stability
avoids second-order optimization used in TRPO
basedOn Proximal Policy Optimization
commonlyUsedFor benchmark continuous control tasks
game-playing agents
robotics control tasks
commonlyUsedWith OpenAI Gym
surface form: OpenAI Gym environments
contrastsWith TRPO
surface form: Trust Region Policy Optimization
controls policy update step size via clipping parameter
designedFor continuous control tasks
discrete control tasks
efficient policy gradient training
stable policy gradient training
goal balance exploration and exploitation
prevent destructive policy updates
hasFeature clipped value function loss
entropy regularization
mini-batch stochastic gradient descent
multiple epochs over the same batch of data
separate policy and value networks
value function baseline
hasHyperparameter GAE lambda
clip range
discount factor gamma
entropy coefficient
learning rate
mini-batch size
number of epochs
value function coefficient
improvesUpon original PPO implementation details
isImplementedIn Stable Baselines
surface form: Stable-Baselines

Stable Baselines
surface form: Stable-Baselines3 (as PPO successor, conceptually similar)
isVariantOf PPO
optimizes stochastic policies
supports on-policy learning
supportsActionSpaces continuous action spaces
discrete action spaces
trainingType actor-critic
updateType first-order optimization
uses advantage estimation
clipped surrogate objective
generalized advantage estimation
gradient-based optimization

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.