Flickr8k

E899058

benchmark dataset image-captioning dataset vision-language dataset

Flickr8k is a benchmark image-captioning dataset consisting of 8,000 images each paired with multiple human-written descriptions, widely used for training and evaluating vision-language models.

Try in SPARQL Jump to: Surface forms Disambiguation Statements Elicitation Referenced by

All labels observed (1)

Label	Occurrences
Flickr8k canonical	1

How this entity was disambiguated

This entity first appeared as the object of triple T11003511 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.

NED1 Entity disambiguation (via context triple) gpt-5-mini-2025-08-07

Target entity: Flickr8k
Context triple: [Show and Tell: A Neural Image Caption Generator, usesDataset, Flickr8k]

A. Flickr
Flickr is an online photo and video hosting and sharing platform that became one of the earliest popular social media sites for photographers and casual users alike.
B. Britannica ImageQuest
Britannica ImageQuest is a curated educational image database offering millions of rights-cleared photos and illustrations for teaching and learning.
C. Images and Words
Images and Words is a landmark 1992 progressive metal album by Dream Theater, widely credited with bringing the band mainstream recognition and defining their signature sound.
D. IMG
IMG is a global sports, events, and talent management company known for representing athletes and models and producing major sporting and fashion events.
E. Images
Images is a set of impressionistic piano pieces by Claude Debussy that evoke vivid, atmospheric soundscapes through innovative harmony and tone color.
F. None of above. chosen
G. Unsure - the case is ambiguous/there is not enough information to decide.

NED2 Entity disambiguation (via description) gpt-5-mini-2025-08-07

Target entity: Flickr8k
Target entity description: Flickr8k is a benchmark image-captioning dataset consisting of 8,000 images each paired with multiple human-written descriptions, widely used for training and evaluating vision-language models.

A. Flickr
Flickr is an online photo and video hosting and sharing platform that became one of the earliest popular social media sites for photographers and casual users alike.
B. Britannica ImageQuest
Britannica ImageQuest is a curated educational image database offering millions of rights-cleared photos and illustrations for teaching and learning.
C. Images and Words
Images and Words is a landmark 1992 progressive metal album by Dream Theater, widely credited with bringing the band mainstream recognition and defining their signature sound.
D. IMG
IMG is a global sports, events, and talent management company known for representing athletes and models and producing major sporting and fashion events.
E. Images
Images is a set of impressionistic piano pieces by Claude Debussy that evoke vivid, atmospheric soundscapes through innovative harmony and tone color.
F. None of above. chosen

Statements (46)

Predicate	Object
instanceOf	benchmark dataset ⓘ image-captioning dataset ⓘ vision-language dataset ⓘ
hasAnnotationType	image captions ⓘ natural language descriptions ⓘ
hasApproximateNumberOfImages	8000 ⓘ
hasBenchmarkRole	baseline dataset for image captioning ⓘ
hasCaptionSource	human annotators ⓘ
hasCaptionsPerImage	5 ⓘ
hasDataModality	images ⓘ text captions ⓘ
hasDataType	photographic images ⓘ
hasDescriptionQuality	human-written captions ⓘ
hasDomain	computer vision ⓘ natural language processing ⓘ
hasEvaluationMetrics	BLEU NERFINISHED ⓘ CIDEr NERFINISHED ⓘ METEOR NERFINISHED ⓘ ROUGE NERFINISHED ⓘ
hasImageSource	Flickr NERFINISHED ⓘ
hasLanguage	English ⓘ
hasLicenseType	research use ⓘ
hasName	Flickr8k NERFINISHED ⓘ
hasNumberOfImages	8000 ⓘ
hasScale	small-scale image-captioning dataset ⓘ
hasSource	Flickr NERFINISHED ⓘ
hasTask	multimodal learning ⓘ vision-language grounding ⓘ
hasTypicalSplit	test set ⓘ training set ⓘ validation set ⓘ
isBenchmarkFor	automatic image description ⓘ multimodal representation learning ⓘ
isConsidered	standard benchmark in image captioning ⓘ
isSmallerThan	Flickr30k NERFINISHED ⓘ MS COCO NERFINISHED ⓘ
isUsedIn	vision-and-language research ⓘ
isUsedTo	compare image-captioning algorithms ⓘ evaluate caption generation quality ⓘ
isWidelyUsedFor	benchmarking captioning models ⓘ
isWidelyUsedIn	academic research ⓘ
usedFor	evaluating image-captioning models ⓘ evaluating vision-language models ⓘ image captioning ⓘ training image-captioning models ⓘ training vision-language models ⓘ

How these facts were elicited

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Show and Tell: A Neural Image Caption Generator → usesDataset → Flickr8k ⓘ

All labels observed (1)

How this entity was disambiguated Show

Statements (46)

How these facts were elicited Show

Referenced by (1)

How this entity was disambiguated

How these facts were elicited