REINFORCE
E426681
UNEXPLORED
REINFORCE is a classic Monte Carlo policy gradient algorithm in reinforcement learning that optimizes stochastic policies by estimating gradients from sampled returns.
Jump to:
Referenced by
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.