Triple

T27556890

Position	Surface form	Disambiguated ID	Type / Status
Subject	Whittle index	`E695660`	entity
Predicate	instanceOf	`P0`	FINISHED
Object	policy for restless multi-armed bandits	`C52920`	CONCEPT FINISHED

How this triple was built (1 step)

Every LLM step that produced this triple, in pipeline order — named-entity classification, the disambiguation choices (the exact options shown, with the pick highlighted), and the generated description. The batch + timestamp of each is in the Provenance table below.

CD Concept disambiguation gpt-5-mini-2025-08-07

Target class: policy for restless multi-armed bandits
Context triple: [Whittle index, instanceOf, policy for restless multi-armed bandits]

A. object in optimal stopping theory
An object in optimal stopping theory is an abstract entity (such as a stochastic process, payoff function, or stopping rule) whose evolution or evaluation over time determines when it is best to stop observing and take an action to maximize expected reward or minimize expected cost.
B. Monte Carlo reinforcement learning algorithm
A Monte Carlo reinforcement learning algorithm is a method that learns optimal policies by estimating value functions from complete, sampled episodes of experience without requiring a model of the environment’s dynamics.
C. value-based reinforcement learning method
A value-based reinforcement learning method is an approach that learns a value function estimating expected future rewards for states or state-action pairs and derives a policy by selecting actions that maximize these estimated values.
D. policy gradient algorithm
A policy gradient algorithm is a reinforcement learning method that directly optimizes a parameterized policy by estimating and following the gradient of expected cumulative reward with respect to the policy parameters.
E. model-based reinforcement learning algorithm
A model-based reinforcement learning algorithm is a decision-making method that learns or uses an explicit model of the environment’s dynamics to plan and select actions that maximize long-term rewards.
F. None of above. chosen

Provenance (1 batch)

The batch behind each pipeline step, in order, with when it ran. Timestamps are batch-level — stages were processed in waves, so the object chain (NER → NED1 → NEDg → NED2) reads in order, but predicate / elicitation batches can sit in a different wave.

Step	Stage	Batch ID	Status	When
creating	Elicitation	`batch_69ef5387e97c8190a9dab040d21cd048`	completed	April 27, 2026, 12:16 p.m.

Created at: April 27, 2026, 1:37 p.m.