Adam
E701497
Adam is a widely used stochastic optimization algorithm in machine learning that combines ideas from momentum and adaptive learning rates to efficiently train deep neural networks.
Statements (46)
| Predicate | Object |
|---|---|
| instanceOf |
optimization algorithm
ⓘ
stochastic optimization method ⓘ |
| abbreviationFor | Adaptive Moment Estimation NERFINISHED ⓘ |
| appliedIn |
computer vision
ⓘ
natural language processing ⓘ reinforcement learning ⓘ speech recognition ⓘ |
| basedOn |
adaptive learning rates
ⓘ
momentum ⓘ stochastic gradient descent ⓘ |
| commonVariant |
AMSGrad
NERFINISHED
ⓘ
AdamW NERFINISHED ⓘ |
| comparedWith |
AdaGrad
NERFINISHED
ⓘ
RMSProp NERFINISHED ⓘ SGD with momentum NERFINISHED ⓘ |
| defaultHyperparameter |
beta1 = 0.9
ⓘ
beta2 = 0.999 ⓘ epsilon = 1e-8 ⓘ learning rate = 0.001 ⓘ |
| describedIn | Adam: A Method for Stochastic Optimization NERFINISHED ⓘ |
| field |
deep learning
ⓘ
machine learning ⓘ |
| hasProperty |
computationally efficient
ⓘ
handles sparse gradients ⓘ memory efficient ⓘ scale invariant to gradient magnitudes ⓘ suitable for high-dimensional parameter spaces ⓘ suitable for large datasets ⓘ |
| implementedIn |
JAX
NERFINISHED
ⓘ
Keras NERFINISHED ⓘ PyTorch NERFINISHED ⓘ TensorFlow NERFINISHED ⓘ |
| introducedIn | 2014 ⓘ |
| optimizationType | first-order method ⓘ |
| performs | bias correction of moment estimates ⓘ |
| proposedBy |
Diederik P. Kingma
NERFINISHED
ⓘ
Jimmy Ba NERFINISHED ⓘ |
| publishedAt | International Conference on Learning Representations NERFINISHED ⓘ |
| publishedIn | 2015 ⓘ |
| updates | parameters with element-wise adaptive learning rates ⓘ |
| usedFor |
stochastic optimization
ⓘ
training deep neural networks ⓘ |
| uses |
exponentially decaying averages of past gradients
ⓘ
exponentially decaying averages of past squared gradients ⓘ first moment estimates of gradients ⓘ second moment estimates of gradients ⓘ |
Referenced by (2)
Full triples — surface form annotated when it differs from this entity's canonical label.