Triple

T4586020

Position	Surface form	Disambiguated ID	Type / Status
Subject	Double DQN	`E101969`	entity
Predicate	instanceOf	`P0`	FINISHED
Object	off-policy algorithm	`C9067`	CONCEPT FINISHED

How this triple was built (1 step)

Every LLM step that produced this triple, in pipeline order — named-entity classification, the disambiguation choices (the exact options shown, with the pick highlighted), and the generated description. The batch + timestamp of each is in the Provenance table below.

CD Concept disambiguation gpt-5-mini-2025-08-07

Target class: off-policy algorithm
Context triple: [Double DQN, instanceOf, off-policy algorithm]

A. actor-critic method
An actor-critic method is a reinforcement learning approach that combines a policy model (actor) that selects actions with a value model (critic) that evaluates those actions to improve the policy.
B. model-based reinforcement learning algorithm
A model-based reinforcement learning algorithm is a decision-making method that learns or uses an explicit model of the environment’s dynamics to plan and select actions that maximize long-term rewards.
C. value-based reinforcement learning method chosen
A value-based reinforcement learning method is an approach that learns a value function estimating expected future rewards for states or state-action pairs and derives a policy by selecting actions that maximize these estimated values.
D. reinforcement learning library
A reinforcement learning library is a software toolkit that provides algorithms, environments, and utilities to design, train, evaluate, and deploy agents that learn optimal behaviors through trial-and-error interactions with their environment.
E. experience replay method
An experience replay method is a reinforcement learning technique that stores past agent-environment interactions in a memory buffer and reuses randomly sampled batches of these experiences to stabilize and improve learning efficiency.
F. None of above.

Provenance (1 batch)

The batch behind each pipeline step, in order, with when it ran. Timestamps are batch-level — stages were processed in waves, so the object chain (NER → NED1 → NEDg → NED2) reads in order, but predicate / elicitation batches can sit in a different wave.

Step	Stage	Batch ID	Status	When
creating	Elicitation	`batch_69bd43d4ce208190b53158c882b222e3`	completed	March 20, 2026, 12:55 p.m.

Created at: March 20, 2026, 1:10 p.m.