Adam optimizer

E182821

The Adam optimizer is a popular stochastic gradient descent method in machine learning that adaptively adjusts learning rates for each parameter using estimates of first and second moments of gradients.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (4)

Label Occurrences
Adam optimizer canonical 5
Adam optimization algorithm 2
AdamW 1

Statements (56)

Predicate Object
instanceOf adaptive learning rate method
optimization algorithm
stochastic gradient descent variant
baseAlgorithm stochastic gradient descent
category adaptive gradient method
commonlyUsedIn computer vision models
deep learning
natural language processing models
neural network training
reinforcement learning
defaultBeta1 0.9
defaultBeta2 0.999
defaultEpsilon 1e-8
defaultLearningRate 0.001
fullName Adam: A Method for Stochastic Optimization
surface form: Adaptive Moment Estimation
gradientRequirement first-order gradients
hasVariant AdaGrad
surface form: AMSGrad

Adam optimizer self-linksurface differs
surface form: AdamW
hyperparameter beta1
beta2
epsilon
learning rate
weight decay (in some implementations)
implementedIn Chainer
JAX
Keras
MXNet
PyTorch
TensorFlow
fastai
inspiredBy AdaGrad
RMSProp
introducedBy Diederik P. Kingma
Jimmy Ba
introducedInPaper Adam: A Method for Stochastic Optimization
maintains per-parameter first moment vector
per-parameter second moment vector
optimizationTarget minimization of loss function
parameterUpdateDependsOn current gradient
first moment estimate
second moment estimate
performsBiasCorrection true
publicationYear 2014
relatedTo AdaDelta
AdaGrad
RMSProp
strength fast convergence in practice
suitableFor large-scale problems
non-stationary objectives
sparse gradients
updateRuleType adaptive learning rate
uses exponentially decaying averages of past gradients
exponentially decaying averages of past squared gradients
usesFirstMomentEstimate gradient mean
usesSecondMomentEstimate uncentered gradient variance
weakness can generalize worse than SGD with momentum in some settings

Referenced by (9)

Full triples — surface form annotated when it differs from this entity's canonical label.

Jimmy Ba knownFor Adam optimizer
Dueling DQN usesOptimizationMethod Adam optimizer
RMSProp oftenComparedWith Adam optimizer
Adam optimizer hasVariant Adam optimizer self-linksurface differs
this entity surface form: AdamW
Adam: A Method for Stochastic Optimization influenced Adam optimizer
this entity surface form: AdamW optimizer
Diederik P. Kingma notableWork Adam optimizer
this entity surface form: Adam optimization algorithm
Diederik P. Kingma coDeveloperOf Adam optimizer
this entity surface form: Adam optimization algorithm
Diederik P. Kingma knownFor Adam optimizer