TD(lambda)

E636114

TD(λ) is a temporal-difference reinforcement learning algorithm that blends multi-step returns using a decay parameter λ to efficiently estimate value functions from bootstrapped experience.

Try in SPARQL Jump to: Surface forms Statements Referenced by

Observed surface forms (1)

Surface form Occurrences
TD(λ) 0

Statements (47)

Predicate Object
instanceOf reinforcement learning algorithm
temporal-difference learning algorithm
value function learning method
aimsTo minimize prediction error of value function
approaches Monte Carlo method NERFINISHED
assumes stationary environment dynamics
blends Monte Carlo returns
n-step returns
one-step TD returns
canBeCombinedWith function approximation
linear value function approximation
nonlinear value function approximation
canEstimate action-value function
category model-free reinforcement learning method
computes TD error δt
controlsBiasVarianceTradeoffWith λ
describedIn Reinforcement Learning: An Introduction NERFINISHED
estimates state-value function
generalizes TD(0)
hasHyperparameter λ
hasParameter discount factor γ
learning rate
value function representation
λ
hasView backward view
forward view
implements backward view of multi-step returns
introducedIn reinforcement learning literature
isBasedOn TD(0)
isRelatedTo Q(λ)
SARSA(λ) NERFINISHED
eligibility-trace methods
isUsedFor policy evaluation
prediction problems in reinforcement learning
operatesOn sequences of states and rewards
popularizedBy Richard S. Sutton NERFINISHED
propagates TD errors backward through time
reducesTo Monte Carlo evaluation when λ = 1 (under episodic tasks and certain conditions)
TD(0) when λ = 0
requires Markov decision process setting NERFINISHED
updates value estimates after each time step
uses bootstrapped targets
temporal-difference error
usesConcept bootstrapping
eligibility traces
multi-step returns
temporal-difference learning

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.