Show, Attend and Tell

E899062

attention-based model deep learning model neural image captioning model

Show, Attend and Tell is a neural image captioning model that introduced visual attention mechanisms to dynamically focus on different parts of an image while generating descriptive text.

Try in SPARQL Jump to: Statements Referenced by

Statements (48)

Predicate	Object
instanceOf	attention-based model ⓘ deep learning model ⓘ neural image captioning model ⓘ
affiliationContext	University of Montreal NERFINISHED ⓘ
approach	encoder-decoder architecture ⓘ
citationImpact	highly cited in vision-language research ⓘ
demonstrates	visualization of attention maps over image regions ⓘ
domain	computer vision ⓘ natural language processing ⓘ
evaluationDataset	Flickr30k NERFINISHED ⓘ Flickr8k NERFINISHED ⓘ MSCOCO NERFINISHED ⓘ
focusesOn	different parts of an image during caption generation ⓘ
goal	generate descriptive natural language captions for images ⓘ
hasAuthor	Aaron Courville NERFINISHED ⓘ Jimmy Ba NERFINISHED ⓘ Kelvin Xu NERFINISHED ⓘ Kyunghyun Cho NERFINISHED ⓘ Richard Zemel NERFINISHED ⓘ Ruslan Salakhutdinov NERFINISHED ⓘ Ryan Kiros NERFINISHED ⓘ Yoshua Bengio NERFINISHED ⓘ
hasFullName	Show, Attend and Tell: Neural Image Caption Generation with Visual Attention NERFINISHED ⓘ
improvesOver	non-attention image captioning models ⓘ
influenced	transformer-based image captioning models ⓘ
inspired	later attention-based vision-language models ⓘ
introduced	visual attention mechanism for image captioning ⓘ
language	English captions ⓘ
learningParadigm	supervised learning ⓘ
property	dynamically attends to image regions at each word step ⓘ
publicationType	conference paper ⓘ
publicationYear	2015 ⓘ
publishedIn	International Conference on Machine Learning NERFINISHED ⓘ
publishedInShort	ICML NERFINISHED ⓘ
task	image caption generation ⓘ
uses	CNN features as image encoder ⓘ RNN language model ⓘ alignment model between image regions and words ⓘ hard attention ⓘ soft attention ⓘ visual attention ⓘ
usesDecoder	LSTM NERFINISHED ⓘ recurrent neural network ⓘ
usesEncoder	convolutional neural network ⓘ
usesTrainingMethod	REINFORCE for hard attention approximation ⓘ backpropagation through time ⓘ stochastic gradient descent ⓘ
usesTrainingObjective	maximum likelihood estimation ⓘ

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Show and Tell: A Neural Image Caption Generator → influence → Show, Attend and Tell ⓘ