Flickr30k

E899059

Flickr30k is a large-scale image dataset of 31,000 photographs each paired with multiple human-written captions, widely used for training and evaluating image captioning and vision-language models.

Try in SPARQL Jump to: Statements Referenced by

Statements (46)

Predicate Object
instanceOf benchmark dataset
image dataset
vision-language dataset
containsContentType activities
everyday scenes
objects
people
domain computer vision
multimodal AI
natural language processing
hasAnnotationGranularity image-level descriptions
hasAnnotationType sentence-level captions
hasApproximateNumberOfImages 31000
hasCaptionsPerImage 5
hasCollectionPlatform Flickr website GENERATED
hasDataModality images
text captions
hasDataSplit test set
training set
validation set
hasInputFormat image plus multiple captions
hasLanguage English
hasLicense research use
hasNumberOfImages 31000
hasScaleComparedTo larger than Flickr8k
hasTask generate caption from image
retrieve caption from image
retrieve image from caption
hasTotalCaptions 155000
hasTypicalImageResolution variable
imagesSource Flickr NERFINISHED
isSuccessorOf Flickr8k NERFINISHED
isUsedToEvaluate caption quality
image-text alignment
multimodal representation learning
isWidelyUsedIn academic research
image captioning benchmarks
vision-language evaluation
usedFor benchmarking captioning systems
evaluation of models
image captioning
image-text retrieval
multimodal learning
natural language description of images
training models
vision-language modeling

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.