Q-learning

E455376

model-free reinforcement learning method reinforcement learning algorithm temporal-difference learning method

Q-learning is a model-free reinforcement learning algorithm that learns an action-value function to optimize decision-making by estimating the expected cumulative reward for each state-action pair.

Try in SPARQL Jump to: Surface forms Statements Referenced by

Observed surface forms (1)

Surface form	Occurrences
Double Q-learning	1

Statements (47)

Predicate	Object
instanceOf	model-free reinforcement learning method ⓘ reinforcement learning algorithm ⓘ temporal-difference learning method ⓘ
assumes	discrete action space in basic form ⓘ
canBeExtendedTo	Deep Q-learning ⓘ
canBeImplementedWith	tabular representation ⓘ
canHandle	stochastic rewards ⓘ stochastic transitions ⓘ
canUseExplorationStrategy	epsilon-greedy policy ⓘ softmax action selection ⓘ
canUseFunctionApproximation	linear function approximator ⓘ neural network ⓘ
convergesUnderConditions	Markov decision process ⓘ decaying learning rate ⓘ sufficient exploration ⓘ
describedInPaper	Q-learning ⓘ
differsFrom	SARSA as on-policy vs off-policy ⓘ
doesNotRequire	model of environment dynamics ⓘ
estimates	expected cumulative reward ⓘ
hasAuthor	Christopher J. C. H. Watkins NERFINISHED ⓘ
hasCoAuthor	Peter Dayan NERFINISHED ⓘ
hasKeyEquation	Q(s,a) ← Q(s,a) + α[r + γ max_{a'} Q(s',a') − Q(s,a)] ⓘ
isModelFree	true ⓘ
isOffPolicy	true ⓘ
isPartOf	reinforcement learning field ⓘ
isRelatedTo	SARSA NERFINISHED ⓘ
isSensitiveTo	exploration schedule ⓘ learning rate choice ⓘ reward scaling ⓘ
isUsedFor	optimal policy learning ⓘ
isUsedIn	autonomous decision-making ⓘ game playing ⓘ resource allocation ⓘ robotics control ⓘ
learns	action-value function ⓘ
operatesOn	state-action pairs ⓘ
policyDerivedBy	greedy action selection over Q-values ⓘ
publicationYear	1992 ⓘ
publishedInJournal	Machine Learning NERFINISHED ⓘ
requires	reward signal ⓘ
solves	Markov decision process control problems ⓘ
updatesFrom	sample transitions ⓘ
usesDiscountFactor	gamma ⓘ
usesLearningRateParameter	alpha ⓘ
usesMaxOperatorOver	next-state action values ⓘ
usesUpdateRule	Bellman optimality equation NERFINISHED ⓘ
usesValueFunction	true ⓘ

Referenced by (2)

Full triples — surface form annotated when it differs from this entity's canonical label.

Double DQN → basedOn → Q-learning ⓘ

Double DQN → inspiredBy → Q-learning ⓘ

this entity surface form: Double Q-learning