Attention Is All You Need

E457850

"Attention Is All You Need" is the landmark 2017 research paper that introduced the Transformer architecture and revolutionized modern natural language processing and sequence modeling.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (1)

Label Occurrences
Attention Is All You Need canonical 2

Statements (53)

Predicate Object
instanceOf computer science paper
research paper
scientific paper
affiliatedInstitution Google Brain NERFINISHED
Google Research NERFINISHED
applicationDomain machine translation
architectureType encoder-decoder
benchmarkDataset WMT 2014 English-to-French translation NERFINISHED
WMT 2014 English-to-German translation NERFINISHED
citationStatus highly cited paper
enabled parallel training of sequence models
field deep learning
machine learning
natural language processing
sequence modeling
hasAuthor Aidan N. Gomez NERFINISHED
Ashish Vaswani NERFINISHED
Illia Polosukhin NERFINISHED
Jakob Uszkoreit NERFINISHED
Llion Jones NERFINISHED
Niki Parmar NERFINISHED
Noam Shazeer NERFINISHED
Łukasz Kaiser NERFINISHED
impact became foundational for large language models
revolutionized modern natural language processing
inspiredModel BERT NERFINISHED
GPT series NERFINISHED
T5 NERFINISHED
introducedConcept Transformer architecture
multi-head attention
positional encoding
scaled dot-product attention
self-attention mechanism
optimizationMethod Adam optimizer NERFINISHED
outperformed previous state-of-the-art machine translation models
proposedModel Transformer NERFINISHED
publicationYear 2017
publishedIn Advances in Neural Information Processing Systems 30 NERFINISHED
publishedInConference NeurIPS 2017 NERFINISHED
publisher Neural Information Processing Systems Foundation NERFINISHED
reduced sequential computation in sequence models
replacedArchitecture GRU networks
LSTM networks NERFINISHED
recurrent neural networks
title Attention Is All You Need NERFINISHED
usesComponent dropout regularization
layer normalization
multi-head self-attention layers
position-wise feed-forward networks
residual connections
stacked decoder layers
stacked encoder layers
usesTechnique label smoothing

Referenced by (2)

Full triples — surface form annotated when it differs from this entity's canonical label.

Transformer introducedInPaper Attention Is All You Need
Lukasz Kaiser coAuthorOf Attention Is All You Need
subject surface form: Łukasz Kaiser