Transformer encoder-only

E457857

A Transformer encoder-only model is a neural network architecture that uses only the encoder stack of the Transformer to process input sequences, typically for tasks like classification, retrieval, and masked language modeling.

Try in SPARQL Jump to: Statements Referenced by

Statements (54)

Predicate Object
instanceOf Transformer-based model
neural network architecture
advantage captures bidirectional context
parallelizable over sequence positions
attentionDirection bidirectional
attentionType self-attention
basedOn Transformer architecture
canBe fine-tuned model
pretrained language model
canBeAppliedTo multimodal tasks
vision tasks
commonlyImplementedIn JAX NERFINISHED
PyTorch NERFINISHED
TensorFlow NERFINISHED
doesNotUseComponent Transformer decoder stack
domain natural language processing
hasComponent layer normalization
multi-head self-attention layer
position-wise feed-forward network
positional encoding
residual connections
hyperparameter dropout rate
hidden size
intermediate feed-forward size
maximum sequence length
number of attention heads
number of encoder layers
inputType continuous embeddings
token sequences
limitation not directly suitable for autoregressive generation
outputType pooled representation
sequence representations
relatedModel ALBERT NERFINISHED
BERT NERFINISHED
DeBERTa NERFINISHED
DistilBERT NERFINISHED
ELECTRA NERFINISHED
RoBERTa NERFINISHED
trainingObjective classification loss
contrastive loss
masked language modeling loss
metric learning loss
typicalUse document classification
document embedding
information retrieval
masked language modeling
named entity recognition
semantic search
sentence classification
sentence embedding
sequence tagging
text classification
token classification
usesComponent Transformer encoder stack NERFINISHED

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Transformer hasVariant Transformer encoder-only