TD(lambda)

E636114

reinforcement learning algorithm temporal-difference learning algorithm value function learning method

TD(λ) is a temporal-difference reinforcement learning algorithm that blends multi-step returns using a decay parameter λ to efficiently estimate value functions from bootstrapped experience.

Try in SPARQL Jump to: Surface forms Statements Referenced by

Observed surface forms (1)

Surface form	Occurrences
TD(λ)	0

Statements (47)

Predicate	Object
instanceOf	reinforcement learning algorithm ⓘ temporal-difference learning algorithm ⓘ value function learning method ⓘ
aimsTo	minimize prediction error of value function ⓘ
approaches	Monte Carlo method NERFINISHED ⓘ
assumes	stationary environment dynamics ⓘ
blends	Monte Carlo returns ⓘ n-step returns ⓘ one-step TD returns ⓘ
canBeCombinedWith	function approximation ⓘ linear value function approximation ⓘ nonlinear value function approximation ⓘ
canEstimate	action-value function ⓘ
category	model-free reinforcement learning method ⓘ
computes	TD error δt ⓘ
controlsBiasVarianceTradeoffWith	λ ⓘ
describedIn	Reinforcement Learning: An Introduction NERFINISHED ⓘ
estimates	state-value function ⓘ
generalizes	TD(0) ⓘ
hasHyperparameter	λ ⓘ
hasParameter	discount factor γ ⓘ learning rate ⓘ value function representation ⓘ λ ⓘ
hasView	backward view ⓘ forward view ⓘ
implements	backward view of multi-step returns ⓘ
introducedIn	reinforcement learning literature ⓘ
isBasedOn	TD(0) ⓘ
isRelatedTo	Q(λ) ⓘ SARSA(λ) NERFINISHED ⓘ eligibility-trace methods ⓘ
isUsedFor	policy evaluation ⓘ prediction problems in reinforcement learning ⓘ
operatesOn	sequences of states and rewards ⓘ
popularizedBy	Richard S. Sutton NERFINISHED ⓘ
propagates	TD errors backward through time ⓘ
reducesTo	Monte Carlo evaluation when λ = 1 (under episodic tasks and certain conditions) ⓘ TD(0) when λ = 0 ⓘ
requires	Markov decision process setting NERFINISHED ⓘ
updates	value estimates after each time step ⓘ
uses	bootstrapped targets ⓘ temporal-difference error ⓘ
usesConcept	bootstrapping ⓘ eligibility traces ⓘ multi-step returns ⓘ temporal-difference learning ⓘ

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Generalized Advantage Estimation → relatedTo → TD(lambda) ⓘ