Megatron-LM

E760435

Megatron-LM is a large-scale language model training framework developed by NVIDIA, designed to efficiently train massive transformer models through model, tensor, and pipeline parallelism.

All labels observed (1)

Label Occurrences
Megatron-LM canonical 1

How this entity was disambiguated

Statements (49)

Predicate Object
instanceOf large-scale language model training framework
open-source software project
basedOn PyTorch NERFINISHED
developer NVIDIA NERFINISHED
domain large-scale deep learning
natural language processing
hasFeature configurable parallelism degrees
fused attention kernels
fused bias and activation operations
fused layer normalization
highly optimized fused CUDA kernels
scalable tokenizer and dataset tools
scripted training pipelines
implements 3D parallelism
activation checkpointing
data parallelism
distributed training
gradient checkpointing
mixed precision training
model parallelism
optimizer state sharding
pipeline parallelism
tensor parallelism
influenced Megatron-DeepSpeed NERFINISHED
license Apache License 2.0
optimizedFor NVIDIA GPUs NERFINISHED
multi-GPU systems
multi-node clusters
programmingLanguage Python
repository https://github.com/NVIDIA/Megatron-LM
supports BERT-style language models
BF16 precision
FP16 precision
GPT-style language models
T5-style models
ZeRO-style optimizer partitioning
causal language modeling
checkpoints for large models
curriculum learning setups
decoder-only transformer architectures
distributed data loading
encoder-decoder transformer architectures
masked language modeling
sequence-to-sequence training
transformer models
usedFor scaling transformer models to hundreds of billions of parameters
training very large language models
usedIn NVIDIA large language model research
uses NVIDIA NCCL NERFINISHED

How these facts were elicited

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

NCCL usedBy Megatron-LM