Q-learning

E455376

Q-learning is a model-free reinforcement learning algorithm that learns an action-value function to optimize decision-making by estimating the expected cumulative reward for each state-action pair.

Try in SPARQL Jump to: Surface forms Statements Referenced by

Observed surface forms (1)

Surface form Occurrences
Double Q-learning 1

Statements (47)

Predicate Object
instanceOf model-free reinforcement learning method
reinforcement learning algorithm
temporal-difference learning method
assumes discrete action space in basic form
canBeExtendedTo Deep Q-learning
canBeImplementedWith tabular representation
canHandle stochastic rewards
stochastic transitions
canUseExplorationStrategy epsilon-greedy policy
softmax action selection
canUseFunctionApproximation linear function approximator
neural network
convergesUnderConditions Markov decision process
decaying learning rate
sufficient exploration
describedInPaper Q-learning
differsFrom SARSA as on-policy vs off-policy
doesNotRequire model of environment dynamics
estimates expected cumulative reward
hasAuthor Christopher J. C. H. Watkins NERFINISHED
hasCoAuthor Peter Dayan NERFINISHED
hasKeyEquation Q(s,a) ← Q(s,a) + α[r + γ max_{a'} Q(s',a') − Q(s,a)]
isModelFree true
isOffPolicy true
isPartOf reinforcement learning field
isRelatedTo SARSA NERFINISHED
isSensitiveTo exploration schedule
learning rate choice
reward scaling
isUsedFor optimal policy learning
isUsedIn autonomous decision-making
game playing
resource allocation
robotics control
learns action-value function
operatesOn state-action pairs
policyDerivedBy greedy action selection over Q-values
publicationYear 1992
publishedInJournal Machine Learning NERFINISHED
requires reward signal
solves Markov decision process control problems
updatesFrom sample transitions
usesDiscountFactor gamma
usesLearningRateParameter alpha
usesMaxOperatorOver next-state action values
usesUpdateRule Bellman optimality equation NERFINISHED
usesValueFunction true

Referenced by (2)

Full triples — surface form annotated when it differs from this entity's canonical label.

Double DQN basedOn Q-learning
Double DQN inspiredBy Q-learning
this entity surface form: Double Q-learning