vision-language dataset

C63730
concept

A vision-language dataset is a curated collection of paired visual data (such as images or videos) and corresponding textual annotations designed to train and evaluate models that jointly understand and generate visual and linguistic information.

Observed surface forms (2)

Surface form Occurrences
image captioning dataset 1
image-captioning dataset 1

Instances (3)

Instance Via concept surface
MSCOCO image captioning dataset
Flickr8k image-captioning dataset
Flickr30k