Proximal Policy Optimization

E162136

Proximal Policy Optimization is a popular reinforcement learning algorithm that improves policy gradient methods by using clipped objective functions to achieve stable and efficient training.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (5)

Statements (49)

Predicate Object
instanceOf policy gradient method
reinforcement learning algorithm
abbreviation PPO
aimsTo constrain policy updates
improve sample efficiency
improve training stability
algorithmFamily actor-critic
arXivId 1707.06347
citationVenue arXiv preprint
commonlyUsedFor game playing
robotics control
simulated control tasks
comparedTo TRPO
surface form: Trust Region Policy Optimization
designGoal avoid second-order optimization
simplify trust region methods
developedAt OpenAI
evaluationBenchmarks Atari 2600
surface form: Atari 2600 games

MuJoCo continuous control tasks
OpenAI Gym
influenced many modern deep RL baselines
introducedBy Alec Radford
Filip Wolski
John Schulman
Oleg Klimov NERFINISHED
Prafulla Dhariwal
introducedInPaper Proximal Policy Optimization self-linksurface differs
surface form: Proximal Policy Optimization Algorithms
introducedInYear 2017
keyHyperparameter GAE lambda
clip range
discount factor
entropy coefficient
learning rate
minibatch size
number of epochs
objectiveContains clipping term
entropy bonus (optional)
objectiveType surrogate objective
oftenImplementedIn PyTorch
TensorFlow
optimizationType on-policy
policyRepresentation neural network
relatedTo TRPO
surface form: Trust Region Policy Optimization
supports continuous action spaces
discrete action spaces
updateRule multiple epochs of minibatch updates per batch of data
uses clipped surrogate objective
policy gradient
stochastic gradient ascent
valueFunction critic network

Referenced by (13)

Full triples — surface form annotated when it differs from this entity's canonical label.

John Schulman notableWork Proximal Policy Optimization
John Schulman authorOf Proximal Policy Optimization
this entity surface form: “Proximal Policy Optimization Algorithms”
PPO fullName Proximal Policy Optimization
PPO abbreviationFor Proximal Policy Optimization
PPO introducedInPaper Proximal Policy Optimization
this entity surface form: Proximal Policy Optimization Algorithms
PPO hasVariant Proximal Policy Optimization
this entity surface form: PPO-Clip
PPO2 basedOn Proximal Policy Optimization
PPO2 abbreviationOf Proximal Policy Optimization
this entity surface form: Proximal Policy Optimization 2
Proximal Policy Optimization introducedInPaper Proximal Policy Optimization self-linksurface differs
this entity surface form: Proximal Policy Optimization Algorithms
Generalized Advantage Estimation compatibleWith Proximal Policy Optimization
Philipp Moritz notableWork Proximal Policy Optimization
Philipp Moritz coAuthorOf Proximal Policy Optimization
this entity surface form: Proximal Policy Optimization Algorithms
Philipp Moritz hasGivenTalkOn Proximal Policy Optimization