multimodal large language model family
C16888
concept
A multimodal large language model family is a group of related neural models that can jointly process and generate multiple data modalities—such as text, images, audio, or video—using shared architectures, training objectives, and parameterizations.
Observed surface forms (7)
| Surface form | Occurrences |
|---|---|
| multimodal large language model | 3 |
| AI model family | 1 |
| AI model family variant | 1 |
| large multimodal language model | 1 |
| multimodal AI model | 1 |
| multimodal foundation model | 1 |
| multimodal transformer model | 1 |
Instances (8)
| Instance | Via concept surface |
|---|---|
| Google Gemini | — |
| LayoutLM | multimodal transformer model |
|
Gretchen Krueger
surface form:
CLIP
|
multimodal AI model |
| GPT-4o | multimodal large language model |
| Gemini Ultra | large multimodal language model |
| Gemini 1.5 | multimodal foundation model |
| Gemini 2.0 | multimodal large language model |
| Gemini 2.0 Flash | multimodal large language model |