Triple

T4586020
Position Surface form Disambiguated ID Type / Status
Subject Double DQN E101969 entity
Predicate instanceOf P0 FINISHED
Object off-policy algorithm C9067 CONCEPT FINISHED

How this triple was built (1 step)

Every LLM step that produced this triple, in pipeline order — named-entity classification, the disambiguation choices (the exact options shown, with the pick highlighted), and the generated description. The batch + timestamp of each is in the Provenance table below.

CD Concept disambiguation gpt-5-mini-2025-08-07
Target class: off-policy algorithm
Context triple: [Double DQN, instanceOf, off-policy algorithm]
  • A. actor-critic method
    An actor-critic method is a reinforcement learning approach that combines a policy model (actor) that selects actions with a value model (critic) that evaluates those actions to improve the policy.
  • B. model-based reinforcement learning algorithm
    A model-based reinforcement learning algorithm is a decision-making method that learns or uses an explicit model of the environment’s dynamics to plan and select actions that maximize long-term rewards.
  • C. value-based reinforcement learning method chosen
    A value-based reinforcement learning method is an approach that learns a value function estimating expected future rewards for states or state-action pairs and derives a policy by selecting actions that maximize these estimated values.
  • D. reinforcement learning library
    A reinforcement learning library is a software toolkit that provides algorithms, environments, and utilities to design, train, evaluate, and deploy agents that learn optimal behaviors through trial-and-error interactions with their environment.
  • E. experience replay method
    An experience replay method is a reinforcement learning technique that stores past agent-environment interactions in a memory buffer and reuses randomly sampled batches of these experiences to stabilize and improve learning efficiency.
  • F. None of above.

Provenance (1 batch)

The batch behind each pipeline step, in order, with when it ran. Timestamps are batch-level — stages were processed in waves, so the object chain (NER → NED1 → NEDg → NED2) reads in order, but predicate / elicitation batches can sit in a different wave.

Step Stage Batch ID Status When
creating Elicitation batch_69bd43d4ce208190b53158c882b222e3 completed March 20, 2026, 12:55 p.m.
Created at: March 20, 2026, 1:10 p.m.