DDPG

E98481

DDPG (Deep Deterministic Policy Gradient) is a model-free, off-policy deep reinforcement learning algorithm designed for continuous action spaces, combining ideas from DQN and actor-critic methods.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (3)

Statements (50)

Predicate Object
instanceOf actor-critic algorithm
deep reinforcement learning algorithm
model-free reinforcement learning method
off-policy reinforcement learning method
actorObjective maximize critic-estimated Q-value
algorithmFamily Q-learning inspired methods
policy gradient methods
basedOn deterministic policy gradient theorem
category continuous-action RL algorithm
commonlyEvaluatedOn MuJoCo benchmarks
OpenAI Gym continuous control environments
commonlyUsedFor continuous control tasks
contrastWith Atari deep Q-network
surface form: DQN (which handles discrete actions)

stochastic policy gradient methods
criticLossType temporal-difference error
criticObjective minimize Bellman error
explorationStrategy noise added to deterministic policy output
fullName DDPG self-linksurface differs
surface form: Deep Deterministic Policy Gradient
handlesActionSpaceType continuous action space
inputToActor state
inputToCritic state-action pair
inspiredBy Atari deep Q-network
surface form: Deep Q-Network
introducedBy Alexander Pritzel
Daan Wierstra
David Silver
Jonathan J. Hunt
Nicolas Heess
Timothy P. Lillicrap
Tom Erez
Yuval Tassa
introducedInPaper DDPG self-linksurface differs
surface form: Continuous control with deep reinforcement learning
introducedInYear 2015
optimizationMethod gradient descent
outputOfActor continuous action
outputOfCritic Q-value
policyType deterministic policy
stabilityTechnique experience replay
target networks
trainingParadigm off-policy learning
updateType bootstrapped TD learning
uses Ornstein–Uhlenbeck process
surface form: Ornstein-Uhlenbeck noise

actor network
critic network
experience replay buffer
exploration noise process
soft target updates
target actor network
target critic network
usesFunctionApproximator deep neural network
valueFunctionType action-value function

Referenced by (7)

Full triples — surface form annotated when it differs from this entity's canonical label.

OpenAI Baselines implementsAlgorithm DDPG
this entity surface form: Deep Deterministic Policy Gradient
DDPG fullName DDPG self-linksurface differs
this entity surface form: Deep Deterministic Policy Gradient
DDPG introducedInPaper DDPG self-linksurface differs
this entity surface form: Continuous control with deep reinforcement learning
Hindsight Experience Replay compatibleWith DDPG
this entity surface form: Deep Deterministic Policy Gradient