AdaGrad
E565192
AdaGrad is an adaptive gradient descent optimization algorithm that adjusts learning rates for individual parameters based on their historical gradients, often improving convergence in sparse settings.
Observed surface forms (1)
| Surface form | Occurrences |
|---|---|
| AMSGrad | 1 |
Statements (47)
| Predicate | Object |
|---|---|
| instanceOf |
adaptive learning rate method
ⓘ
optimization algorithm ⓘ |
| appliedIn |
computer vision
ⓘ
natural language processing ⓘ online learning ⓘ recommender systems ⓘ stochastic gradient descent variants ⓘ |
| basedOn | gradient descent ⓘ |
| category | first-order optimization method ⓘ |
| comparedWith |
Adam
NERFINISHED
ⓘ
RMSProp NERFINISHED ⓘ SGD NERFINISHED ⓘ |
| defines | G_t as sum of past squared gradients ⓘ |
| describedIn | Adaptive Subgradient Methods for Online Learning and Stochastic Optimization NERFINISHED ⓘ |
| fullName | Adaptive Gradient Algorithm NERFINISHED ⓘ |
| hasProperty |
accumulates squared gradients
ⓘ
adaptive learning rate ⓘ diagonal preconditioning ⓘ element-wise parameter updates ⓘ monotonically decreasing learning rates ⓘ no need for manual learning rate decay schedule ⓘ often improves convergence in sparse settings ⓘ per-parameter learning rates ⓘ scale-invariant to gradient magnitude ⓘ sensitive to learning rate hyperparameter ⓘ well-suited for sparse data ⓘ |
| implementedIn |
PyTorch
NERFINISHED
ⓘ
TensorFlow NERFINISHED ⓘ scikit-learn NERFINISHED ⓘ |
| influenced |
Adadelta
ⓘ
Adam NERFINISHED ⓘ RMSProp NERFINISHED ⓘ |
| introducedIn | 2011 ⓘ |
| limitation |
learning rate can become too small over time
ⓘ
may converge slowly in non-sparse settings ⓘ |
| operatesOn |
model parameters
ⓘ
stochastic gradients ⓘ |
| proposedBy |
Elad Hazan
NERFINISHED
ⓘ
John Duchi NERFINISHED ⓘ Yoram Singer NERFINISHED ⓘ |
| publishedAt | Journal of Machine Learning Research NERFINISHED ⓘ |
| updateRule | theta_t = theta_{t-1} - (eta / (sqrt(G_t) + epsilon)) * g_t ⓘ |
| usedFor |
optimizing objective functions
ⓘ
stochastic optimization ⓘ training machine learning models ⓘ |
| uses |
epsilon for numerical stability
ⓘ
global initial learning rate ⓘ |
Referenced by (7)
Full triples — surface form annotated when it differs from this entity's canonical label.
this entity surface form:
AMSGrad