multimodal large language model family

C16888
concept

A multimodal large language model family is a group of related neural models that can jointly process and generate multiple data modalities—such as text, images, audio, or video—using shared architectures, training objectives, and parameterizations.

Observed surface forms (7)

Surface form Occurrences
multimodal large language model 3
AI model family 1
AI model family variant 1

Instances (8)

Instance Via concept surface
Google Gemini
LayoutLM multimodal transformer model
Gretchen Krueger
surface form: CLIP
multimodal AI model
GPT-4o multimodal large language model
Gemini Ultra large multimodal language model
Gemini 1.5 multimodal foundation model
Gemini 2.0 multimodal large language model
Gemini 2.0 Flash multimodal large language model