GPT-1
E469810
GPT-1 is the first-generation Generative Pre-trained Transformer language model developed by OpenAI, introducing the pretrain-then-finetune paradigm for large-scale NLP.
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
Generative Pre-trained Transformer
ⓘ
autoregressive language model ⓘ large language model ⓘ |
| architectureDepth | 12 layers ⓘ |
| basedOn | Transformer architecture ⓘ |
| coAuthor |
Ilya Sutskever
NERFINISHED
ⓘ
Karthik Narasimhan NERFINISHED ⓘ Tim Salimans NERFINISHED ⓘ |
| designGoal |
improve sample efficiency for NLP tasks
ⓘ
leverage unsupervised data for representation learning ⓘ |
| developer | OpenAI NERFINISHED ⓘ |
| field |
artificial intelligence
ⓘ
machine learning ⓘ natural language processing ⓘ |
| fineTuningTasks |
natural language inference
ⓘ
question answering ⓘ reading comprehension ⓘ semantic similarity ⓘ text classification ⓘ |
| improvedOver | task-specific models trained from scratch ⓘ |
| inferenceMode | left-to-right generation ⓘ |
| influenced |
GPT-2
NERFINISHED
ⓘ
GPT-3 NERFINISHED ⓘ subsequent large language models ⓘ |
| inputType | text ⓘ |
| introducedConcept |
large-scale unsupervised pretraining for NLP
ⓘ
task-specific supervised fine-tuning after pretraining ⓘ |
| language | English ⓘ |
| modelType | unidirectional transformer ⓘ |
| notableContribution |
demonstrated transfer learning in NLP
ⓘ
showed that a single pretrained model can be adapted to many tasks ⓘ |
| numberOfParameters | 117M ⓘ |
| organization | OpenAI NERFINISHED ⓘ |
| outputType | text ⓘ |
| parameterCountCategory | hundreds of millions of parameters ⓘ |
| pretrainingDataType | BooksCorpus-like web text ⓘ |
| pretrainingTask | language modeling ⓘ |
| primaryAuthor | Alec Radford NERFINISHED ⓘ |
| publicationTitle | Improving Language Understanding by Generative Pre-Training NERFINISHED ⓘ |
| publicationVenue | OpenAI technical report NERFINISHED ⓘ |
| publicationYear | 2018 ⓘ |
| tokenizerType | Byte Pair Encoding NERFINISHED ⓘ |
| trainingMethod |
supervised fine-tuning
ⓘ
unsupervised pretraining ⓘ |
| trainingObjective | next-token prediction ⓘ |
| trainingParadigm | pretrain-then-finetune ⓘ |
| uses |
multi-head self-attention
ⓘ
positional embeddings ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.