DDPG

E98481

DDPG (Deep Deterministic Policy Gradient) is a model-free, off-policy deep reinforcement learning algorithm designed for continuous action spaces, combining ideas from DQN and actor-critic methods.


Statements (50)
Predicate Object
instanceOf actor-critic algorithm
deep reinforcement learning algorithm
model-free reinforcement learning method
off-policy reinforcement learning method
actorObjective maximize critic-estimated Q-value
algorithmFamily Q-learning inspired methods
policy gradient methods
basedOn deterministic policy gradient theorem
category continuous-action RL algorithm
commonlyEvaluatedOn MuJoCo benchmarks
OpenAI Gym continuous control environments
commonlyUsedFor continuous control tasks
contrastWith DQN (which handles discrete actions) NERFINISHED
stochastic policy gradient methods
criticLossType temporal-difference error
criticObjective minimize Bellman error
explorationStrategy noise added to deterministic policy output
fullName Deep Deterministic Policy Gradient NERFINISHED
handlesActionSpaceType continuous action space
inputToActor state
inputToCritic state-action pair
inspiredBy Deep Q-Network NERFINISHED
introducedBy Alexander Pritzel NERFINISHED
Daan Wierstra NERFINISHED
David Silver NERFINISHED
Jonathan J. Hunt NERFINISHED
Nicolas Heess NERFINISHED
Timothy P. Lillicrap NERFINISHED
Tom Erez NERFINISHED
Yuval Tassa NERFINISHED
introducedInPaper Continuous control with deep reinforcement learning NERFINISHED
introducedInYear 2015
optimizationMethod gradient descent
outputOfActor continuous action
outputOfCritic Q-value
policyType deterministic policy
stabilityTechnique experience replay
target networks
trainingParadigm off-policy learning
updateType bootstrapped TD learning
uses Ornstein-Uhlenbeck noise NERFINISHED
actor network
critic network
experience replay buffer
exploration noise process
soft target updates
target actor network
target critic network
usesFunctionApproximator deep neural network
valueFunctionType action-value function

Referenced by (4)
Subject (surface form when different) Predicate
OpenAI Baselines
OpenAI Baselines ("Deep Deterministic Policy Gradient")
implementsAlgorithm
Stable Baselines
supportsAlgorithm
TF-Agents
supportsAlgorithmFamily

Please wait…