Adam optimizer
E182821
The Adam optimizer is a popular stochastic gradient descent method in machine learning that adaptively adjusts learning rates for each parameter using estimates of first and second moments of gradients.
All labels observed (4)
| Label | Occurrences |
|---|---|
| Adam optimizer canonical | 5 |
| Adam optimization algorithm | 2 |
| AdamW | 1 |
| AdamW optimizer | 1 |
Statements (56)
| Predicate | Object |
|---|---|
| instanceOf |
adaptive learning rate method
ⓘ
optimization algorithm ⓘ stochastic gradient descent variant ⓘ |
| baseAlgorithm | stochastic gradient descent ⓘ |
| category | adaptive gradient method ⓘ |
| commonlyUsedIn |
computer vision models
ⓘ
deep learning ⓘ natural language processing models ⓘ neural network training ⓘ reinforcement learning ⓘ |
| defaultBeta1 | 0.9 ⓘ |
| defaultBeta2 | 0.999 ⓘ |
| defaultEpsilon | 1e-8 ⓘ |
| defaultLearningRate | 0.001 ⓘ |
| fullName |
Adam: A Method for Stochastic Optimization
ⓘ
surface form:
Adaptive Moment Estimation
|
| gradientRequirement | first-order gradients ⓘ |
| hasVariant |
AdaGrad
ⓘ
surface form:
AMSGrad
Adam optimizer self-linksurface differs ⓘ
surface form:
AdamW
|
| hyperparameter |
beta1
ⓘ
beta2 ⓘ epsilon ⓘ learning rate ⓘ weight decay (in some implementations) ⓘ |
| implementedIn |
Chainer
ⓘ
JAX ⓘ Keras ⓘ MXNet ⓘ PyTorch ⓘ TensorFlow ⓘ fastai ⓘ |
| inspiredBy |
AdaGrad
ⓘ
RMSProp ⓘ |
| introducedBy |
Diederik P. Kingma
ⓘ
Jimmy Ba ⓘ |
| introducedInPaper | Adam: A Method for Stochastic Optimization ⓘ |
| maintains |
per-parameter first moment vector
ⓘ
per-parameter second moment vector ⓘ |
| optimizationTarget | minimization of loss function ⓘ |
| parameterUpdateDependsOn |
current gradient
ⓘ
first moment estimate ⓘ second moment estimate ⓘ |
| performsBiasCorrection | true ⓘ |
| publicationYear | 2014 ⓘ |
| relatedTo |
AdaDelta
ⓘ
AdaGrad ⓘ RMSProp ⓘ |
| strength | fast convergence in practice ⓘ |
| suitableFor |
large-scale problems
ⓘ
non-stationary objectives ⓘ sparse gradients ⓘ |
| updateRuleType | adaptive learning rate ⓘ |
| uses |
exponentially decaying averages of past gradients
ⓘ
exponentially decaying averages of past squared gradients ⓘ |
| usesFirstMomentEstimate | gradient mean ⓘ |
| usesSecondMomentEstimate | uncentered gradient variance ⓘ |
| weakness | can generalize worse than SGD with momentum in some settings ⓘ |
Referenced by (9)
Full triples — surface form annotated when it differs from this entity's canonical label.
this entity surface form:
AdamW
this entity surface form:
AdamW optimizer
this entity surface form:
Adam optimization algorithm
this entity surface form:
Adam optimization algorithm