AdaDelta

E565193

AdaDelta is an adaptive learning rate optimization algorithm for training neural networks that improves upon methods like RMSProp by eliminating the need to manually set a global learning rate.

Try in SPARQL Jump to: Statements Referenced by

Statements (39)

Predicate Object
instanceOf adaptive learning rate method
optimization algorithm
stochastic gradient-based optimization method
appliedIn computer vision
natural language processing
speech recognition
basedOn stochastic gradient descent
comparedWith AdaGrad NERFINISHED
Adam NERFINISHED
Momentum NERFINISHED
Nesterov momentum
RMSProp NERFINISHED
SGD NERFINISHED
describedIn ADADELTA: An Adaptive Learning Rate Method NERFINISHED
field deep learning
machine learning
goal accelerate convergence in deep networks
improve training stability
reduce sensitivity to initial learning rate choice
hasCharacteristic adaptive per-parameter learning rates
no need for manual global learning rate
robust to choice of hyperparameters
scale-invariant update rule
uses running averages of squared gradients
uses running averages of squared parameter updates
hasHyperparameter decay rate rho
epsilon
implementedIn Keras NERFINISHED
MXNet NERFINISHED
PyTorch NERFINISHED
TensorFlow NERFINISHED
Theano NERFINISHED
improvesUpon RMSProp NERFINISHED
introducedBy Matthew D. Zeiler NERFINISHED
optimizationType first-order method
publicationYear 2012
updateRule uses ratio of accumulated gradients to accumulated updates
usedFor minimizing loss functions
training neural networks

Referenced by (2)

Full triples — surface form annotated when it differs from this entity's canonical label.

RMSProp relatedTo AdaDelta
Adam optimizer relatedTo AdaDelta