SAC

E426679

deep reinforcement learning algorithm off-policy reinforcement learning algorithm

SAC (Soft Actor-Critic) is a popular off-policy deep reinforcement learning algorithm that optimizes both expected return and policy entropy to achieve stable and efficient learning in continuous control tasks.

Try in SPARQL Jump to: Surface forms Disambiguation Statements Elicitation Referenced by

All labels observed (1)

Label	Occurrences
SAC canonical	2

How this entity was disambiguated

This entity first appeared as the object of triple T4277524 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.

NED1 Entity disambiguation (via context triple) gpt-5-mini-2025-08-07

Target entity: SAC
Context triple: [TF-Agents, supportsAlgorithmFamily, SAC]

A. SAC
SAC is a NATO-led multinational program that provides participating nations with shared strategic airlift capabilities using a fleet of C-17 Globemaster III aircraft.
B. SACT
SACT is the Supreme Allied Commander Transformation, the NATO strategic commander responsible for leading the alliance’s military transformation and capability development.
C. SACS
SACS is a regional accrediting body in the United States that evaluates and certifies the quality and standards of educational institutions in the southern states.
D. SAC Steering Board
The SAC Steering Board is the multinational governing body that oversees and directs NATO’s Strategic Airlift Capability program on behalf of its participating nations.
E. SacRT
SacRT is the public transit agency serving the Sacramento, California metropolitan area with bus, light rail, and related transportation services.
F. None of above. chosen
G. Unsure - the case is ambiguous/there is not enough information to decide.

NED2 Entity disambiguation (via description) gpt-5-mini-2025-08-07

Target entity: SAC
Target entity description: SAC (Soft Actor-Critic) is a popular off-policy deep reinforcement learning algorithm that optimizes both expected return and policy entropy to achieve stable and efficient learning in continuous control tasks.

A. SAC
SAC is a NATO-led multinational program that provides participating nations with shared strategic airlift capabilities using a fleet of C-17 Globemaster III aircraft.
B. SACT
SACT is the Supreme Allied Commander Transformation, the NATO strategic commander responsible for leading the alliance’s military transformation and capability development.
C. SACS
SACS is a regional accrediting body in the United States that evaluates and certifies the quality and standards of educational institutions in the southern states.
D. SAC Steering Board
The SAC Steering Board is the multinational governing body that oversees and directs NATO’s Strategic Airlift Capability program on behalf of its participating nations.
E. SacRT
SacRT is the public transit agency serving the Sacramento, California metropolitan area with bus, light rail, and related transportation services.
F. None of above. chosen

Statements (49)

Predicate	Object
instanceOf	deep reinforcement learning algorithm ⓘ deep reinforcement learning algorithm ⓘ off-policy reinforcement learning algorithm ⓘ off-policy reinforcement learning algorithm ⓘ
abbreviation	SAC NERFINISHED ⓘ
advantageOverDeterministicMethods	better exploration via entropy maximization ⓘ
aimsFor	sample-efficient learning ⓘ stable learning ⓘ
canBe	model-free ⓘ
category	actor-critic methods ⓘ
commonlyEvaluatedOn	MuJoCo benchmarks NERFINISHED ⓘ OpenAI Gym continuous control tasks ⓘ
comparedWith	DDPG NERFINISHED ⓘ TD3 NERFINISHED ⓘ
criticUpdateBasedOn	soft Bellman backup ⓘ
firstPublishedIn	"Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor" NERFINISHED ⓘ
fullName	Soft Actor-Critic NERFINISHED ⓘ
hasVariant	automatic entropy tuning SAC ⓘ discrete SAC ⓘ
implementedIn	PyTorch NERFINISHED ⓘ TensorFlow NERFINISHED ⓘ
introducedIn	2018 ⓘ
is	maximum entropy reinforcement learning method ⓘ
laterExtendedIn	"Soft Actor-Critic Algorithms and Applications" NERFINISHED ⓘ
learningSignal	temporal-difference error ⓘ
objectiveIncludes	temperature parameter ⓘ
optimizationObjective	expected return ⓘ policy entropy ⓘ
policyOutput	distribution over continuous actions ⓘ
policyType	stochastic policy ⓘ
policyUpdateBasedOn	reparameterization trick ⓘ
proposedBy	Aurick Zhou NERFINISHED ⓘ Pieter Abbeel NERFINISHED ⓘ Sergey Levine NERFINISHED ⓘ Tuomas Haarnoja NERFINISHED ⓘ
sampleEfficiency	high ⓘ
supports	continuous action spaces ⓘ
temperatureParameterControls	entropy-returns trade-off ⓘ
trainingStability	high ⓘ
typicalDomain	continuous control tasks ⓘ
updateStyle	off-policy updates ⓘ
usedIn	autonomous driving research ⓘ manipulation tasks ⓘ robotics control ⓘ
uses	actor-critic architecture ⓘ entropy regularization ⓘ replay buffer ⓘ target networks ⓘ
valueFunctionType	soft Q-function ⓘ

How these facts were elicited

Referenced by (2)

Full triples — surface form annotated when it differs from this entity's canonical label.

TF-Agents → supportsAlgorithmFamily → SAC ⓘ

Stable Baselines → supportsAlgorithm → SAC ⓘ

All labels observed (1)

How this entity was disambiguated Show

Statements (49)

How these facts were elicited Show

Referenced by (2)

How this entity was disambiguated

How these facts were elicited