GPT-2
E18339
OpenAI model
autoregressive language model
large language model
neural network
transformer-based model
GPT-2 is a large transformer-based language model known for generating coherent, human-like text and sparking widespread discussion about the implications of advanced AI text generation.
Aliases (2)
Statements (53)
| Predicate | Object |
|---|---|
| instanceOf |
OpenAI model
→
autoregressive language model → large language model → neural network → transformer-based model → |
| announcementDate |
2019-02-14
→
|
| architectureType |
decoder-only transformer
→
|
| basedOn |
Transformer architecture
→
|
| contextWindowSize |
1024 tokens
→
|
| developer |
OpenAI
→
|
| evaluationSetting |
zero-shot evaluation on downstream tasks
→
|
| framework |
TensorFlow
→
|
| fullModelReleaseDate |
2019-11
→
|
| implementationLanguage |
Python
→
|
| influenced |
subsequent large language models
→
|
| initialReleasePolicy |
partial model release
→
staged release → |
| inputType |
text
→
|
| language |
English
→
|
| largestVersionParameters |
1.5B
→
|
| license |
OpenAI model license
→
|
| notableFor |
generating coherent long-form text
→
public debate about AI misuse risks → zero-shot performance on many NLP tasks → |
| notableImpact |
popularized large-scale transformer language models
→
|
| openSourceImplementation |
Hugging Face Transformers
→
|
| outputType |
text
→
|
| paperAuthors |
Alec Radford
→
Dario Amodei → David Luan → Ilya Sutskever → Jeff Wu → Rewon Child → |
| paperTitle |
Language Models are Unsupervised Multitask Learners
→
|
| parameterCount |
1.5B
→
117M → 345M → 774M → |
| predecessor |
GPT
→
|
| releaseDate |
2019-02
→
|
| safetyConcern |
potential for generating misinformation
→
potential for impersonation → potential for spam generation → |
| successor |
GPT-3
→
|
| task |
language modeling
→
text completion → text generation → zero-shot learning → |
| tokenizerType |
Byte Pair Encoding
→
|
| trainingDataSource |
WebText dataset
→
|
| trainingDataType |
web pages
→
|
| trainingObjective |
next token prediction
→
|
| trainingParadigm |
unsupervised learning
→
|
Referenced by (7)
| Subject (surface form when different) | Predicate |
|---|---|
|
WebText
→
|
associatedWith |
|
OpenAI
→
|
developed |
|
OpenAI
("Generative Pre-trained Transformer")
→
|
introducedModelFamily |
|
Rewon Child
("Language Models are Unsupervised Multitask Learners")
→
|
notableWork |
|
GPT-2
("Language Models are Unsupervised Multitask Learners")
→
|
paperTitle |
|
GPT-3
→
|
predecessor |
|
Hugging Face Transformers
→
|
supportsModelType |