Triple

T27556890
Position Surface form Disambiguated ID Type / Status
Subject Whittle index E695660 entity
Predicate instanceOf P0 FINISHED
Object policy for restless multi-armed bandits C52920 CONCEPT FINISHED

How this triple was built (1 step)

Every LLM step that produced this triple, in pipeline order — named-entity classification, the disambiguation choices (the exact options shown, with the pick highlighted), and the generated description. The batch + timestamp of each is in the Provenance table below.

CD Concept disambiguation gpt-5-mini-2025-08-07
Target class: policy for restless multi-armed bandits
Context triple: [Whittle index, instanceOf, policy for restless multi-armed bandits]
  • A. object in optimal stopping theory
    An object in optimal stopping theory is an abstract entity (such as a stochastic process, payoff function, or stopping rule) whose evolution or evaluation over time determines when it is best to stop observing and take an action to maximize expected reward or minimize expected cost.
  • B. Monte Carlo reinforcement learning algorithm
    A Monte Carlo reinforcement learning algorithm is a method that learns optimal policies by estimating value functions from complete, sampled episodes of experience without requiring a model of the environment’s dynamics.
  • C. value-based reinforcement learning method
    A value-based reinforcement learning method is an approach that learns a value function estimating expected future rewards for states or state-action pairs and derives a policy by selecting actions that maximize these estimated values.
  • D. policy gradient algorithm
    A policy gradient algorithm is a reinforcement learning method that directly optimizes a parameterized policy by estimating and following the gradient of expected cumulative reward with respect to the policy parameters.
  • E. model-based reinforcement learning algorithm
    A model-based reinforcement learning algorithm is a decision-making method that learns or uses an explicit model of the environment’s dynamics to plan and select actions that maximize long-term rewards.
  • F. None of above. chosen

Provenance (1 batch)

The batch behind each pipeline step, in order, with when it ran. Timestamps are batch-level — stages were processed in waves, so the object chain (NER → NED1 → NEDg → NED2) reads in order, but predicate / elicitation batches can sit in a different wave.

Step Stage Batch ID Status When
creating Elicitation batch_69ef5387e97c8190a9dab040d21cd048 completed April 27, 2026, 12:16 p.m.
Created at: April 27, 2026, 1:37 p.m.