Reformer: The Efficient Transformer

E899034

Reformer: The Efficient Transformer is a research paper introducing a more memory- and computation-efficient Transformer architecture using techniques like locality-sensitive hashing attention and reversible layers.

Try in SPARQL Jump to: Statements Referenced by

Statements (48)

Predicate Object
instanceOf machine learning paper
neural network architecture
research paper
scientific publication
addresses quadratic memory complexity of self-attention
quadratic time complexity of self-attention
aimsTo enable training on very long sequences
reduce computational cost of Transformers
reduce memory usage of Transformers
applicationDomain language modeling
long-context tasks
sequence modeling
assumes similar tokens attend mostly to each other
basedOn Transformer architecture
category efficient Transformer variant
complexityClaim reduces attention complexity from O(L^2) to approximately O(L log L)
contribution demonstrates training on sequences with tens of thousands of tokens
shows LSH attention can approximate full attention with lower cost
shows reversible layers can significantly reduce activation memory
field deep learning
machine learning
natural language processing
neural networks
goal scale Transformers to longer sequences without prohibitive resource usage
improvesOn standard Transformer
influencedBy Reversible residual networks NERFINISHED
locality-sensitive hashing
introducesTechnique chunked feed-forward layers
locality-sensitive hashing attention
reversible residual layers
shared query-key projections for attention
LSHAttentionProperty computes attention only within buckets
groups similar queries into buckets
optimizationTarget memory efficiency
time efficiency
proposes Reformer architecture
relatedTo Linformer NERFINISHED
Longformer NERFINISHED
Performer
Transformer NERFINISHED
sparse attention models
reversibleLayerProperty reconstructs intermediate activations during backpropagation
stores only activations at boundaries between layers
title Reformer: The Efficient Transformer NERFINISHED
uses approximate nearest neighbor search via LSH
position-wise feed-forward networks
reversible layers to recompute activations
self-attention mechanism

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Lukasz Kaiser coAuthorOf Reformer: The Efficient Transformer
subject surface form: Łukasz Kaiser