Bloom

E435874

Bloom is a large open-access multilingual language model developed by the BigScience research workshop for text generation and understanding tasks.

All labels observed (1)

Label Occurrences
Bloom canonical 4

How this entity was disambiguated

Statements (48)

Predicate Object
instanceOf autoregressive transformer model
large language model
multilingual language model
accessModel open-access
architecture decoder-only transformer
contextWindowSize 2048 tokens
developer BigScience Research Workshop NERFINISHED
BigScience community
BigScience workshop NERFINISHED
Hugging Face NERFINISHED
hostingPlatform Hugging Face Hub NERFINISHED
intendedUse downstream NLP applications
experimentation
research
languageSupport multilingual
license Responsible AI License (RAIL) variant NERFINISHED
notableFeature one of the first open-access LLMs at 100B+ parameters
parameterCount 176 billion
projectType collaborative international research project
releaseDate 2022
safetyConsideration subject to content and usage restrictions via license
supportsLanguage Arabic
Chinese
English
French
German
Hindi NERFINISHED
Portuguese
Russian
Spanish
dozens of other languages
task language modeling
text generation
text understanding
tokenizerType SentencePiece NERFINISHED
subword tokenizer
trainingComputeType GPU cluster
trainingDataSize over 300 billion tokens
trainingDataSource ROOTS corpus NERFINISHED
trainingDataType academic publications
books
code
web text
trainingDuration approximately 3.5 months
trainingHardware Jean Zay supercomputer NERFINISHED
trainingHardwareProvider GENCI NERFINISHED
IDRIS NERFINISHED
trainingObjective causal language modeling

How these facts were elicited

Referenced by (4)

Full triples — surface form annotated when it differs from this entity's canonical label.