DDPG
E98481
actor-critic algorithm
deep reinforcement learning algorithm
model-free reinforcement learning method
off-policy reinforcement learning method
DDPG (Deep Deterministic Policy Gradient) is a model-free, off-policy deep reinforcement learning algorithm designed for continuous action spaces, combining ideas from DQN and actor-critic methods.
Aliases (1)
Statements (50)
| Predicate | Object |
|---|---|
| instanceOf |
actor-critic algorithm
→
deep reinforcement learning algorithm → model-free reinforcement learning method → off-policy reinforcement learning method → |
| actorObjective |
maximize critic-estimated Q-value
→
|
| algorithmFamily |
Q-learning inspired methods
→
policy gradient methods → |
| basedOn |
deterministic policy gradient theorem
→
|
| category |
continuous-action RL algorithm
→
|
| commonlyEvaluatedOn |
MuJoCo benchmarks
→
OpenAI Gym continuous control environments → |
| commonlyUsedFor |
continuous control tasks
→
|
| contrastWith |
DQN (which handles discrete actions)
NERFINISHED
→
stochastic policy gradient methods → |
| criticLossType |
temporal-difference error
→
|
| criticObjective |
minimize Bellman error
→
|
| explorationStrategy |
noise added to deterministic policy output
→
|
| fullName |
Deep Deterministic Policy Gradient
NERFINISHED
→
|
| handlesActionSpaceType |
continuous action space
→
|
| inputToActor |
state
→
|
| inputToCritic |
state-action pair
→
|
| inspiredBy |
Deep Q-Network
NERFINISHED
→
|
| introducedBy |
Alexander Pritzel
NERFINISHED
→
Daan Wierstra NERFINISHED → David Silver NERFINISHED → Jonathan J. Hunt NERFINISHED → Nicolas Heess NERFINISHED → Timothy P. Lillicrap NERFINISHED → Tom Erez NERFINISHED → Yuval Tassa NERFINISHED → |
| introducedInPaper |
Continuous control with deep reinforcement learning
NERFINISHED
→
|
| introducedInYear |
2015
→
|
| optimizationMethod |
gradient descent
→
|
| outputOfActor |
continuous action
→
|
| outputOfCritic |
Q-value
→
|
| policyType |
deterministic policy
→
|
| stabilityTechnique |
experience replay
→
target networks → |
| trainingParadigm |
off-policy learning
→
|
| updateType |
bootstrapped TD learning
→
|
| uses |
Ornstein-Uhlenbeck noise
NERFINISHED
→
actor network → critic network → experience replay buffer → exploration noise process → soft target updates → target actor network → target critic network → |
| usesFunctionApproximator |
deep neural network
→
|
| valueFunctionType |
action-value function
→
|
Referenced by (4)
| Subject (surface form when different) | Predicate |
|---|---|
|
OpenAI Baselines
→
OpenAI Baselines ("Deep Deterministic Policy Gradient") → |
implementsAlgorithm |
|
Stable Baselines
→
|
supportsAlgorithm |
|
TF-Agents
→
|
supportsAlgorithmFamily |