TD3

E426680 UNEXPLORED

TD3 (Twin Delayed Deep Deterministic Policy Gradient) is an off-policy deep reinforcement learning algorithm that improves upon DDPG by reducing overestimation bias and stabilizing training for continuous control tasks.


Referenced by (2)
Subject (surface form when different) Predicate
Stable Baselines
supportsAlgorithm
TF-Agents
supportsAlgorithmFamily

Please wait…