PPO

E98478

PPO (Proximal Policy Optimization) is a popular reinforcement learning algorithm known for its stability and sample efficiency in training complex policies, especially in continuous control and high-dimensional environments.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (1)

Label Occurrences
PPO canonical 7

Statements (49)

Predicate Object
instanceOf reinforcement learning algorithm
abbreviationFor Proximal Policy Optimization
aimsFor sample efficiency
stable policy updates
commonlyUsedIn MuJoCo control tasks
OpenAI Gym benchmarks
game playing
robotics control
designedFor complex policies
continuous control tasks
high-dimensional environments
developedBy OpenAI
fullName Proximal Policy Optimization
hasVariant Proximal Policy Optimization
surface form: PPO-Clip

PPO-Penalty
implementedIn PyTorch RL libraries
RLlib
Stable Baselines
surface form: Stable-Baselines3

TF-Agents
surface form: TensorFlow Agents
improvesUpon TRPO
introducedInPaper Proximal Policy Optimization
surface form: Proximal Policy Optimization Algorithms
keyIdea approximates trust region methods without complex constraints
constrains policy updates to be proximal to the old policy
uses clipped surrogate objective
objectiveIncludes entropy bonus (in many implementations)
oftenCombinedWith App Engine
surface form: GAE

advantage estimation
optimizationType on-policy
policy gradient
primaryAuthors Alec Radford
Filip Wolski
John Schulman
Oleg Klimov
Prafulla Dhariwal
property relatively easy to implement
robust to hyperparameter choices
widely adopted as a default RL baseline
publicationYear 2017
relatedTo A2C
A3C
TRPO
supports continuous action spaces
discrete action spaces
trainingStyle mini-batch updates
multiple epochs over collected trajectories
uses clipping parameter epsilon
importance sampling ratio
stochastic gradient ascent
surrogate objective function

Referenced by (7)

Full triples — surface form annotated when it differs from this entity's canonical label.