DDPG
E98481
actor-critic algorithm
deep reinforcement learning algorithm
model-free reinforcement learning method
off-policy reinforcement learning method
DDPG (Deep Deterministic Policy Gradient) is a model-free, off-policy deep reinforcement learning algorithm designed for continuous action spaces, combining ideas from DQN and actor-critic methods.
All labels observed (3)
| Label | Occurrences |
|---|---|
| DDPG canonical | 3 |
| Deep Deterministic Policy Gradient | 3 |
| Continuous control with deep reinforcement learning | 1 |
Statements (50)
| Predicate | Object |
|---|---|
| instanceOf |
actor-critic algorithm
ⓘ
deep reinforcement learning algorithm ⓘ model-free reinforcement learning method ⓘ off-policy reinforcement learning method ⓘ |
| actorObjective | maximize critic-estimated Q-value ⓘ |
| algorithmFamily |
Q-learning inspired methods
ⓘ
policy gradient methods ⓘ |
| basedOn | deterministic policy gradient theorem ⓘ |
| category | continuous-action RL algorithm ⓘ |
| commonlyEvaluatedOn |
MuJoCo benchmarks
ⓘ
OpenAI Gym continuous control environments ⓘ |
| commonlyUsedFor | continuous control tasks ⓘ |
| contrastWith |
Atari deep Q-network
ⓘ
surface form:
DQN (which handles discrete actions)
stochastic policy gradient methods ⓘ |
| criticLossType | temporal-difference error ⓘ |
| criticObjective | minimize Bellman error ⓘ |
| explorationStrategy | noise added to deterministic policy output ⓘ |
| fullName |
DDPG
self-linksurface differs
ⓘ
surface form:
Deep Deterministic Policy Gradient
|
| handlesActionSpaceType | continuous action space ⓘ |
| inputToActor | state ⓘ |
| inputToCritic | state-action pair ⓘ |
| inspiredBy |
Atari deep Q-network
ⓘ
surface form:
Deep Q-Network
|
| introducedBy |
Alexander Pritzel
ⓘ
Daan Wierstra ⓘ David Silver ⓘ Jonathan J. Hunt ⓘ Nicolas Heess ⓘ Timothy P. Lillicrap ⓘ Tom Erez ⓘ Yuval Tassa ⓘ |
| introducedInPaper |
DDPG
self-linksurface differs
ⓘ
surface form:
Continuous control with deep reinforcement learning
|
| introducedInYear | 2015 ⓘ |
| optimizationMethod | gradient descent ⓘ |
| outputOfActor | continuous action ⓘ |
| outputOfCritic | Q-value ⓘ |
| policyType | deterministic policy ⓘ |
| stabilityTechnique |
experience replay
ⓘ
target networks ⓘ |
| trainingParadigm | off-policy learning ⓘ |
| updateType | bootstrapped TD learning ⓘ |
| uses |
Ornstein–Uhlenbeck process
ⓘ
surface form:
Ornstein-Uhlenbeck noise
actor network ⓘ critic network ⓘ experience replay buffer ⓘ exploration noise process ⓘ soft target updates ⓘ target actor network ⓘ target critic network ⓘ |
| usesFunctionApproximator | deep neural network ⓘ |
| valueFunctionType | action-value function ⓘ |
Referenced by (7)
Full triples — surface form annotated when it differs from this entity's canonical label.
this entity surface form:
Deep Deterministic Policy Gradient
this entity surface form:
Deep Deterministic Policy Gradient
this entity surface form:
Continuous control with deep reinforcement learning
this entity surface form:
Deep Deterministic Policy Gradient