Adam

E701496

Adam is a popular stochastic optimization algorithm widely used to train deep learning models by adaptively adjusting learning rates for each parameter.

Try in SPARQL Jump to: Statements Referenced by

Statements (50)

Predicate Object
instanceOf gradient-based optimization algorithm
optimization algorithm
stochastic optimization method
advantage computationally efficient
invariant to diagonal rescaling of gradients
requires little hyperparameter tuning
works well with sparse gradients
basedOn adaptive learning rate methods
momentum methods
stochastic gradient descent
category first-order optimization algorithm
combinesIdeaOf RMSProp NERFINISHED
momentum
commonlyUsedIn computer vision models
natural language processing models
reinforcement learning
defaultBeta1 0.9
defaultBeta2 0.999
defaultEpsilon 1e-8
defaultLearningRate 0.001
field deep learning
machine learning
fullName Adaptive Moment Estimation NERFINISHED
hasHyperparameter beta1
beta2
epsilon
learning rate
hasVariant AMSGrad NERFINISHED
AdaBound NERFINISHED
AdamW NERFINISHED
implementedIn JAX NERFINISHED
Keras NERFINISHED
PyTorch NERFINISHED
TensorFlow NERFINISHED
introducedBy Diederik P. Kingma NERFINISHED
Jimmy Ba NERFINISHED
introducedInPaper Adam: A Method for Stochastic Optimization NERFINISHED
limitation can converge to different minima than SGD
may generalize worse than SGD with momentum in some settings
maintains exponential moving average of gradients
exponential moving average of squared gradients
property adaptive learning rate per parameter
bias-corrected first moment estimates
bias-corrected second moment estimates
publicationVenue International Conference on Learning Representations NERFINISHED
publicationYear 2014
usedFor minimizing loss functions
neural network training
stochastic optimization
training deep learning models

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.