Megatron-LM

E760435

large-scale language model training framework open-source software project

Megatron-LM is a large-scale language model training framework developed by NVIDIA, designed to efficiently train massive transformer models through model, tensor, and pipeline parallelism.

Try in SPARQL Jump to: Surface forms Disambiguation Statements Elicitation Referenced by

All labels observed (1)

Label	Occurrences
Megatron-LM canonical	1

How this entity was disambiguated

This entity first appeared as the object of triple T8823605 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.

NED1 Entity disambiguation (via context triple) gpt-5-mini-2025-08-07

Target entity: Megatron-LM
Context triple: [NCCL, usedBy, Megatron-LM]

A. LLaMA
LLaMA is a family of large language models developed by Meta AI, designed for efficient training and inference across a range of natural language processing tasks.
B. GPT-Neo
GPT-Neo is an open-source family of autoregressive language models developed by EleutherAI as a free alternative to OpenAI’s GPT-3.
C. PaLM 2
PaLM 2 is a large-scale language model developed by Google, known for powering various AI features across Google products before being succeeded by the Gemini family of models.
D. GPT-2
GPT-2 is a large transformer-based language model known for generating coherent, human-like text and sparking widespread discussion about the implications of advanced AI text generation.
E. GPT-3
GPT-3 is a large-scale autoregressive language model known for generating human-like text and performing a wide range of natural language tasks with minimal fine-tuning.
F. None of above. chosen
G. Unsure - the case is ambiguous/there is not enough information to decide.

NED2 Entity disambiguation (via description) gpt-5-mini-2025-08-07

Target entity: Megatron-LM
Target entity description: Megatron-LM is a large-scale language model training framework developed by NVIDIA, designed to efficiently train massive transformer models through model, tensor, and pipeline parallelism.

A. LLaMA
LLaMA is a family of large language models developed by Meta AI, designed for efficient training and inference across a range of natural language processing tasks.
B. GPT-Neo
GPT-Neo is an open-source family of autoregressive language models developed by EleutherAI as a free alternative to OpenAI’s GPT-3.
C. PaLM 2
PaLM 2 is a large-scale language model developed by Google, known for powering various AI features across Google products before being succeeded by the Gemini family of models.
D. GPT-2
GPT-2 is a large transformer-based language model known for generating coherent, human-like text and sparking widespread discussion about the implications of advanced AI text generation.
E. GPT-3
GPT-3 is a large-scale autoregressive language model known for generating human-like text and performing a wide range of natural language tasks with minimal fine-tuning.
F. None of above. chosen

Statements (49)

Predicate	Object
instanceOf	large-scale language model training framework ⓘ open-source software project ⓘ
basedOn	PyTorch NERFINISHED ⓘ
developer	NVIDIA NERFINISHED ⓘ
domain	large-scale deep learning ⓘ natural language processing ⓘ
hasFeature	configurable parallelism degrees ⓘ fused attention kernels ⓘ fused bias and activation operations ⓘ fused layer normalization ⓘ highly optimized fused CUDA kernels ⓘ scalable tokenizer and dataset tools ⓘ scripted training pipelines ⓘ
implements	3D parallelism ⓘ activation checkpointing ⓘ data parallelism ⓘ distributed training ⓘ gradient checkpointing ⓘ mixed precision training ⓘ model parallelism ⓘ optimizer state sharding ⓘ pipeline parallelism ⓘ tensor parallelism ⓘ
influenced	Megatron-DeepSpeed NERFINISHED ⓘ
license	Apache License 2.0 ⓘ
optimizedFor	NVIDIA GPUs NERFINISHED ⓘ multi-GPU systems ⓘ multi-node clusters ⓘ
programmingLanguage	Python ⓘ
repository	https://github.com/NVIDIA/Megatron-LM ⓘ
supports	BERT-style language models ⓘ BF16 precision ⓘ FP16 precision ⓘ GPT-style language models ⓘ T5-style models ⓘ ZeRO-style optimizer partitioning ⓘ causal language modeling ⓘ checkpoints for large models ⓘ curriculum learning setups ⓘ decoder-only transformer architectures ⓘ distributed data loading ⓘ encoder-decoder transformer architectures ⓘ masked language modeling ⓘ sequence-to-sequence training ⓘ transformer models ⓘ
usedFor	scaling transformer models to hundreds of billions of parameters ⓘ training very large language models ⓘ
usedIn	NVIDIA large language model research ⓘ
uses	NVIDIA NCCL NERFINISHED ⓘ

How these facts were elicited

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

NCCL → usedBy → Megatron-LM ⓘ

All labels observed (1)

How this entity was disambiguated Show

Statements (49)

How these facts were elicited Show

Referenced by (1)

How this entity was disambiguated

How these facts were elicited