Universal Value Function Approximators

E441116

goal-conditioned value function model reinforcement learning framework

Universal Value Function Approximators (UVFA) are a reinforcement learning framework that generalizes value functions over both states and goals, enabling agents to learn goal-conditioned behaviors in a unified way.

Try in SPARQL Jump to: Surface forms Disambiguation Statements Elicitation Referenced by

All labels observed (1)

Label	Occurrences
Universal Value Function Approximators canonical	1

How this entity was disambiguated

This entity first appeared as the object of triple T4470554 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.

NED1 Entity disambiguation (via context triple) gpt-5-mini-2025-08-07

Target entity: Universal Value Function Approximators
Context triple: [Hindsight Experience Replay, relatedTo, Universal Value Function Approximators]

A. Generalized Advantage Estimation
Generalized Advantage Estimation is a reinforcement learning technique that reduces variance and improves sample efficiency in policy gradient methods by cleverly estimating the advantage function over multiple time scales.
B. Proximal Policy Optimization
Proximal Policy Optimization is a popular reinforcement learning algorithm that improves policy gradient methods by using clipped objective functions to achieve stable and efficient training.
C. Asynchronous Advantage Actor-Critic
Asynchronous Advantage Actor-Critic is a deep reinforcement learning algorithm that trains multiple parallel agents to learn both policy and value functions efficiently and stably.
D. Atari deep Q-network
The Atari deep Q-network is a pioneering deep reinforcement learning system that learned to play a wide range of Atari 2600 video games directly from raw pixels at human-level or better performance.
E. Asynchronous Methods for Deep Reinforcement Learning
"Asynchronous Methods for Deep Reinforcement Learning" is a 2016 DeepMind paper that introduced asynchronous parallel training techniques for deep reinforcement learning, most notably the A3C algorithm, enabling more stable and efficient learning without specialized hardware.
F. None of above. chosen
G. Unsure - the case is ambiguous/there is not enough information to decide.

NED2 Entity disambiguation (via description) gpt-5-mini-2025-08-07

Target entity: Universal Value Function Approximators
Target entity description: Universal Value Function Approximators (UVFA) are a reinforcement learning framework that generalizes value functions over both states and goals, enabling agents to learn goal-conditioned behaviors in a unified way.

A. Generalized Advantage Estimation
Generalized Advantage Estimation is a reinforcement learning technique that reduces variance and improves sample efficiency in policy gradient methods by cleverly estimating the advantage function over multiple time scales.
B. Proximal Policy Optimization
Proximal Policy Optimization is a popular reinforcement learning algorithm that improves policy gradient methods by using clipped objective functions to achieve stable and efficient training.
C. Asynchronous Advantage Actor-Critic
Asynchronous Advantage Actor-Critic is a deep reinforcement learning algorithm that trains multiple parallel agents to learn both policy and value functions efficiently and stably.
D. Atari deep Q-network
The Atari deep Q-network is a pioneering deep reinforcement learning system that learned to play a wide range of Atari 2600 video games directly from raw pixels at human-level or better performance.
E. Asynchronous Methods for Deep Reinforcement Learning
"Asynchronous Methods for Deep Reinforcement Learning" is a 2016 DeepMind paper that introduced asynchronous parallel training techniques for deep reinforcement learning, most notably the A3C algorithm, enabling more stable and efficient learning without specialized hardware.
F. None of above. chosen

Statements (46)

Predicate	Object
instanceOf	goal-conditioned value function model ⓘ reinforcement learning framework ⓘ
abbreviation	UVFA ⓘ
addresses	lack of generalization across goals in standard value functions ⓘ
approximatorType	parametric function approximator ⓘ
assumes	shared structure across goals ⓘ
citationVenue	Proceedings of the 32nd International Conference on Machine Learning NERFINISHED ⓘ
commonImplementation	neural network ⓘ
compatibleWith	Q-learning NERFINISHED ⓘ actor-critic methods ⓘ policy gradient methods ⓘ
coreIdea	generalize value functions over both states and goals ⓘ represent value as a function of state and goal ⓘ
enables	generalization to unseen goals ⓘ goal-conditioned policies ⓘ multi-goal reinforcement learning ⓘ transfer across goals ⓘ
evaluationDomain	grid-world tasks ⓘ navigation tasks ⓘ
evaluationMetric	performance on multiple goals ⓘ
field	machine learning ⓘ reinforcement learning ⓘ
formalization	V(s,g) as value function over state s and goal g ⓘ
goalRepresentation	can be continuous ⓘ can be discrete ⓘ
inputIncludes	goal representation ⓘ state representation ⓘ
inspired	universal function approximation in supervised learning ⓘ
introducedBy	Daniel Horgan NERFINISHED ⓘ David Silver NERFINISHED ⓘ Karol Gregor NERFINISHED ⓘ Tom Schaul NERFINISHED ⓘ
learningSignal	temporal-difference error ⓘ
organization	DeepMind NERFINISHED ⓘ
outputRepresents	expected return for given state and goal ⓘ
publicationTitle	Universal Value Function Approximators NERFINISHED ⓘ
publicationYear	2015 ⓘ
publishedIn	ICML 2015 NERFINISHED ⓘ International Conference on Machine Learning NERFINISHED ⓘ
relatedTo	goal-conditioned reinforcement learning ⓘ hindsight experience replay ⓘ successor features ⓘ universal policy approximators ⓘ
usedFor	generalization over goal space ⓘ multi-task learning in reinforcement learning ⓘ transfer learning in reinforcement learning ⓘ

How these facts were elicited

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Hindsight Experience Replay → relatedTo → Universal Value Function Approximators ⓘ

All labels observed (1)

How this entity was disambiguated Show

Statements (46)

How these facts were elicited Show

Referenced by (1)

How this entity was disambiguated

How these facts were elicited