Triple

T7027406
Position Surface form Disambiguated ID Type / Status
Subject Generalized Advantage Estimation E163182 entity
Predicate relatedTo P37 FINISHED
Object TD(lambda)
TD(λ) is a temporal-difference reinforcement learning algorithm that blends multi-step returns using a decay parameter λ to efficiently estimate value functions from bootstrapped experience.
E636114 NE FINISHED

Provenance (5 batches)

Stage Batch ID Job type Status
creating batch_69c6885d691c81908cf7d31083113886 elicitation completed
NER batch_69c6e1fee32081908eff988b18daa6d0 ner completed
NED1 batch_69c77588285481909799a2bb76921b9a ned_source_triple completed
NED2 batch_69c77807d33c8190bb3e236829f06071 ned_description completed
NEDg batch_69c777902cbc8190b24ee5e441c5607e nedg completed
Created at: March 27, 2026, 2:35 p.m.