Triple
T27556890
| Position | Surface form | Disambiguated ID | Type / Status |
|---|---|---|---|
| Subject | Whittle index |
E695660
|
entity |
| Predicate | instanceOf |
P0
|
FINISHED |
| Object | policy for restless multi-armed bandits |
C52920
|
CONCEPT FINISHED |
How this triple was built (1 step)
Every LLM step that produced this triple, in pipeline order — named-entity classification, the disambiguation choices (the exact options shown, with the pick highlighted), and the generated description. The batch + timestamp of each is in the Provenance table below.
CD
Concept disambiguation
gpt-5-mini-2025-08-07
Target class: policy for restless multi-armed bandits Context triple: [Whittle index, instanceOf, policy for restless multi-armed bandits]
-
A.
object in optimal stopping theory
An object in optimal stopping theory is an abstract entity (such as a stochastic process, payoff function, or stopping rule) whose evolution or evaluation over time determines when it is best to stop observing and take an action to maximize expected reward or minimize expected cost.
-
B.
Monte Carlo reinforcement learning algorithm
A Monte Carlo reinforcement learning algorithm is a method that learns optimal policies by estimating value functions from complete, sampled episodes of experience without requiring a model of the environment’s dynamics.
-
C.
value-based reinforcement learning method
A value-based reinforcement learning method is an approach that learns a value function estimating expected future rewards for states or state-action pairs and derives a policy by selecting actions that maximize these estimated values.
-
D.
policy gradient algorithm
A policy gradient algorithm is a reinforcement learning method that directly optimizes a parameterized policy by estimating and following the gradient of expected cumulative reward with respect to the policy parameters.
-
E.
model-based reinforcement learning algorithm
A model-based reinforcement learning algorithm is a decision-making method that learns or uses an explicit model of the environment’s dynamics to plan and select actions that maximize long-term rewards.
- F. None of above. chosen
Provenance (1 batch)
The batch behind each pipeline step, in order, with when it ran. Timestamps are batch-level — stages were processed in waves, so the object chain (NER → NED1 → NEDg → NED2) reads in order, but predicate / elicitation batches can sit in a different wave.
| Step | Stage | Batch ID | Status | When |
|---|---|---|---|---|
| creating | Elicitation | batch_69ef5387e97c8190a9dab040d21cd048 |
completed | April 27, 2026, 12:16 p.m. |
Created at: April 27, 2026, 1:37 p.m.