Reformer: The Efficient Transformer
E899034
Reformer: The Efficient Transformer is a research paper introducing a more memory- and computation-efficient Transformer architecture using techniques like locality-sensitive hashing attention and reversible layers.
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
machine learning paper
ⓘ
neural network architecture ⓘ research paper ⓘ scientific publication ⓘ |
| addresses |
quadratic memory complexity of self-attention
ⓘ
quadratic time complexity of self-attention ⓘ |
| aimsTo |
enable training on very long sequences
ⓘ
reduce computational cost of Transformers ⓘ reduce memory usage of Transformers ⓘ |
| applicationDomain |
language modeling
ⓘ
long-context tasks ⓘ sequence modeling ⓘ |
| assumes | similar tokens attend mostly to each other ⓘ |
| basedOn | Transformer architecture ⓘ |
| category | efficient Transformer variant ⓘ |
| complexityClaim | reduces attention complexity from O(L^2) to approximately O(L log L) ⓘ |
| contribution |
demonstrates training on sequences with tens of thousands of tokens
ⓘ
shows LSH attention can approximate full attention with lower cost ⓘ shows reversible layers can significantly reduce activation memory ⓘ |
| field |
deep learning
ⓘ
machine learning ⓘ natural language processing ⓘ neural networks ⓘ |
| goal | scale Transformers to longer sequences without prohibitive resource usage ⓘ |
| improvesOn | standard Transformer ⓘ |
| influencedBy |
Reversible residual networks
NERFINISHED
ⓘ
locality-sensitive hashing ⓘ |
| introducesTechnique |
chunked feed-forward layers
ⓘ
locality-sensitive hashing attention ⓘ reversible residual layers ⓘ shared query-key projections for attention ⓘ |
| LSHAttentionProperty |
computes attention only within buckets
ⓘ
groups similar queries into buckets ⓘ |
| optimizationTarget |
memory efficiency
ⓘ
time efficiency ⓘ |
| proposes | Reformer architecture ⓘ |
| relatedTo |
Linformer
NERFINISHED
ⓘ
Longformer NERFINISHED ⓘ Performer ⓘ Transformer NERFINISHED ⓘ sparse attention models ⓘ |
| reversibleLayerProperty |
reconstructs intermediate activations during backpropagation
ⓘ
stores only activations at boundaries between layers ⓘ |
| title | Reformer: The Efficient Transformer NERFINISHED ⓘ |
| uses |
approximate nearest neighbor search via LSH
ⓘ
position-wise feed-forward networks ⓘ reversible layers to recompute activations ⓘ self-attention mechanism ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.
subject surface form:
Łukasz Kaiser