REINFORCE

E426681 UNEXPLORED

REINFORCE is a classic Monte Carlo policy gradient algorithm in reinforcement learning that optimizes stochastic policies by estimating gradients from sampled returns.

Jump to: Referenced by

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.