Show, Attend and Tell
E899062
Show, Attend and Tell is a neural image captioning model that introduced visual attention mechanisms to dynamically focus on different parts of an image while generating descriptive text.
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
attention-based model
ⓘ
deep learning model ⓘ neural image captioning model ⓘ |
| affiliationContext | University of Montreal NERFINISHED ⓘ |
| approach | encoder-decoder architecture ⓘ |
| citationImpact | highly cited in vision-language research ⓘ |
| demonstrates | visualization of attention maps over image regions ⓘ |
| domain |
computer vision
ⓘ
natural language processing ⓘ |
| evaluationDataset |
Flickr30k
NERFINISHED
ⓘ
Flickr8k NERFINISHED ⓘ MSCOCO NERFINISHED ⓘ |
| focusesOn | different parts of an image during caption generation ⓘ |
| goal | generate descriptive natural language captions for images ⓘ |
| hasAuthor |
Aaron Courville
NERFINISHED
ⓘ
Jimmy Ba NERFINISHED ⓘ Kelvin Xu NERFINISHED ⓘ Kyunghyun Cho NERFINISHED ⓘ Richard Zemel NERFINISHED ⓘ Ruslan Salakhutdinov NERFINISHED ⓘ Ryan Kiros NERFINISHED ⓘ Yoshua Bengio NERFINISHED ⓘ |
| hasFullName | Show, Attend and Tell: Neural Image Caption Generation with Visual Attention NERFINISHED ⓘ |
| improvesOver | non-attention image captioning models ⓘ |
| influenced | transformer-based image captioning models ⓘ |
| inspired | later attention-based vision-language models ⓘ |
| introduced | visual attention mechanism for image captioning ⓘ |
| language | English captions ⓘ |
| learningParadigm | supervised learning ⓘ |
| property | dynamically attends to image regions at each word step ⓘ |
| publicationType | conference paper ⓘ |
| publicationYear | 2015 ⓘ |
| publishedIn | International Conference on Machine Learning NERFINISHED ⓘ |
| publishedInShort | ICML NERFINISHED ⓘ |
| task | image caption generation ⓘ |
| uses |
CNN features as image encoder
ⓘ
RNN language model ⓘ alignment model between image regions and words ⓘ hard attention ⓘ soft attention ⓘ visual attention ⓘ |
| usesDecoder |
LSTM
NERFINISHED
ⓘ
recurrent neural network ⓘ |
| usesEncoder | convolutional neural network ⓘ |
| usesTrainingMethod |
REINFORCE for hard attention approximation
ⓘ
backpropagation through time ⓘ stochastic gradient descent ⓘ |
| usesTrainingObjective | maximum likelihood estimation ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.