TD3
E426680
UNEXPLORED
TD3 (Twin Delayed Deep Deterministic Policy Gradient) is an off-policy deep reinforcement learning algorithm that improves upon DDPG by reducing overestimation bias and stabilizing training for continuous control tasks.
Referenced by (2)
| Subject (surface form when different) | Predicate |
|---|---|
|
Stable Baselines
→
|
supportsAlgorithm |
|
TF-Agents
→
|
supportsAlgorithmFamily |