Proximal Policy Optimization
E162136
Proximal Policy Optimization is a popular reinforcement learning algorithm that improves policy gradient methods by using clipped objective functions to achieve stable and efficient training.
All labels observed (5)
| Label | Occurrences |
|---|---|
| Proximal Policy Optimization canonical | 7 |
| Proximal Policy Optimization Algorithms | 3 |
| PPO-Clip | 1 |
| Proximal Policy Optimization 2 | 1 |
| “Proximal Policy Optimization Algorithms” | 1 |
Statements (49)
| Predicate | Object |
|---|---|
| instanceOf |
policy gradient method
ⓘ
reinforcement learning algorithm ⓘ |
| abbreviation | PPO ⓘ |
| aimsTo |
constrain policy updates
ⓘ
improve sample efficiency ⓘ improve training stability ⓘ |
| algorithmFamily | actor-critic ⓘ |
| arXivId | 1707.06347 ⓘ |
| citationVenue | arXiv preprint ⓘ |
| commonlyUsedFor |
game playing
ⓘ
robotics control ⓘ simulated control tasks ⓘ |
| comparedTo |
TRPO
ⓘ
surface form:
Trust Region Policy Optimization
|
| designGoal |
avoid second-order optimization
ⓘ
simplify trust region methods ⓘ |
| developedAt | OpenAI ⓘ |
| evaluationBenchmarks |
Atari 2600
ⓘ
surface form:
Atari 2600 games
MuJoCo continuous control tasks ⓘ OpenAI Gym ⓘ |
| influenced | many modern deep RL baselines ⓘ |
| introducedBy |
Alec Radford
ⓘ
Filip Wolski ⓘ John Schulman ⓘ Oleg Klimov NERFINISHED ⓘ Prafulla Dhariwal ⓘ |
| introducedInPaper |
Proximal Policy Optimization
self-linksurface differs
ⓘ
surface form:
Proximal Policy Optimization Algorithms
|
| introducedInYear | 2017 ⓘ |
| keyHyperparameter |
GAE lambda
ⓘ
clip range ⓘ discount factor ⓘ entropy coefficient ⓘ learning rate ⓘ minibatch size ⓘ number of epochs ⓘ |
| objectiveContains |
clipping term
ⓘ
entropy bonus (optional) ⓘ |
| objectiveType | surrogate objective ⓘ |
| oftenImplementedIn |
PyTorch
ⓘ
TensorFlow ⓘ |
| optimizationType | on-policy ⓘ |
| policyRepresentation | neural network ⓘ |
| relatedTo |
TRPO
ⓘ
surface form:
Trust Region Policy Optimization
|
| supports |
continuous action spaces
ⓘ
discrete action spaces ⓘ |
| updateRule | multiple epochs of minibatch updates per batch of data ⓘ |
| uses |
clipped surrogate objective
ⓘ
policy gradient ⓘ stochastic gradient ascent ⓘ |
| valueFunction | critic network ⓘ |
Referenced by (13)
Full triples — surface form annotated when it differs from this entity's canonical label.
this entity surface form:
“Proximal Policy Optimization Algorithms”
this entity surface form:
Proximal Policy Optimization Algorithms
this entity surface form:
PPO-Clip
this entity surface form:
Proximal Policy Optimization 2
Proximal Policy Optimization
→
introducedInPaper
→
Proximal Policy Optimization
self-linksurface differs
ⓘ
this entity surface form:
Proximal Policy Optimization Algorithms
this entity surface form:
Proximal Policy Optimization Algorithms