Reformer architecture
E899032
The Reformer architecture is a neural network model that improves Transformer efficiency by using locality-sensitive hashing attention and reversible layers to greatly reduce memory and computational costs.
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
Transformer-based model
ⓘ
neural network architecture ⓘ |
| aimsTo |
improve Transformer efficiency
ⓘ
reduce computational cost ⓘ reduce memory usage ⓘ |
| attentionRestriction | within hash buckets ⓘ |
| basedOn | Transformer architecture ⓘ |
| belongsTo | efficient Transformer family ⓘ |
| comparedTo | standard Transformer ⓘ |
| competesWith |
Linformer
NERFINISHED
ⓘ
Longformer NERFINISHED ⓘ Sparse Transformer NERFINISHED ⓘ |
| designedFor |
large-context tasks
ⓘ
long sequence modeling ⓘ memory-efficient training ⓘ |
| evaluationDomain | language modeling benchmarks ⓘ |
| groupsTokensBy | hash codes ⓘ |
| hasComponent |
LSH-based self-attention layer
ⓘ
position-wise feed-forward network ⓘ reversible residual block ⓘ |
| hasKeyFeature |
chunked feed-forward layers
ⓘ
locality-sensitive hashing attention ⓘ reversible residual layers ⓘ shared query-key projection ⓘ |
| implementedIn |
JAX
NERFINISHED
ⓘ
PyTorch NERFINISHED ⓘ TensorFlow NERFINISHED ⓘ |
| inspiredBy | locality-sensitive hashing methods ⓘ |
| introducedInPaper | Reformer: The Efficient Transformer NERFINISHED ⓘ |
| memoryOptimizationTechnique |
activation recomputation from outputs
ⓘ
reversible residual computation ⓘ |
| optimizationGoal | scalability to very long sequences ⓘ |
| proposedBy |
Anselm Levskaya
NERFINISHED
ⓘ
Nikita Kitaev NERFINISHED ⓘ Łukasz Kaiser NERFINISHED ⓘ |
| publicationYear | 2020 ⓘ |
| publishedBy | Google researchers NERFINISHED ⓘ |
| reduces |
activation memory footprint
ⓘ
attention computation cost ⓘ |
| reducesComplexityOf | self-attention ⓘ |
| standardTransformerComplexity | O(L^2) ⓘ |
| supports |
autoregressive language modeling
ⓘ
sequence-to-sequence tasks ⓘ |
| targetComplexity | O(L log L) ⓘ |
| uses |
LSH attention
ⓘ
locality-sensitive hashing ⓘ reversible layers ⓘ |
| usesSorting | hash buckets for attention ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.
subject surface form:
Łukasz Kaiser