Adam
E701496
Adam is a popular stochastic optimization algorithm widely used to train deep learning models by adaptively adjusting learning rates for each parameter.
Statements (50)
| Predicate | Object |
|---|---|
| instanceOf |
gradient-based optimization algorithm
ⓘ
optimization algorithm ⓘ stochastic optimization method ⓘ |
| advantage |
computationally efficient
ⓘ
invariant to diagonal rescaling of gradients ⓘ requires little hyperparameter tuning ⓘ works well with sparse gradients ⓘ |
| basedOn |
adaptive learning rate methods
ⓘ
momentum methods ⓘ stochastic gradient descent ⓘ |
| category | first-order optimization algorithm ⓘ |
| combinesIdeaOf |
RMSProp
NERFINISHED
ⓘ
momentum ⓘ |
| commonlyUsedIn |
computer vision models
ⓘ
natural language processing models ⓘ reinforcement learning ⓘ |
| defaultBeta1 | 0.9 ⓘ |
| defaultBeta2 | 0.999 ⓘ |
| defaultEpsilon | 1e-8 ⓘ |
| defaultLearningRate | 0.001 ⓘ |
| field |
deep learning
ⓘ
machine learning ⓘ |
| fullName | Adaptive Moment Estimation NERFINISHED ⓘ |
| hasHyperparameter |
beta1
ⓘ
beta2 ⓘ epsilon ⓘ learning rate ⓘ |
| hasVariant |
AMSGrad
NERFINISHED
ⓘ
AdaBound NERFINISHED ⓘ AdamW NERFINISHED ⓘ |
| implementedIn |
JAX
NERFINISHED
ⓘ
Keras NERFINISHED ⓘ PyTorch NERFINISHED ⓘ TensorFlow NERFINISHED ⓘ |
| introducedBy |
Diederik P. Kingma
NERFINISHED
ⓘ
Jimmy Ba NERFINISHED ⓘ |
| introducedInPaper | Adam: A Method for Stochastic Optimization NERFINISHED ⓘ |
| limitation |
can converge to different minima than SGD
ⓘ
may generalize worse than SGD with momentum in some settings ⓘ |
| maintains |
exponential moving average of gradients
ⓘ
exponential moving average of squared gradients ⓘ |
| property |
adaptive learning rate per parameter
ⓘ
bias-corrected first moment estimates ⓘ bias-corrected second moment estimates ⓘ |
| publicationVenue | International Conference on Learning Representations NERFINISHED ⓘ |
| publicationYear | 2014 ⓘ |
| usedFor |
minimizing loss functions
ⓘ
neural network training ⓘ stochastic optimization ⓘ training deep learning models ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.