VisionEncoderDecoderModel

E435885 UNEXPLORED

VisionEncoderDecoderModel is a Hugging Face Transformers architecture that combines a vision encoder with a text decoder to perform tasks like image captioning and visual question answering.


Referenced by (1)
Subject (surface form when different) Predicate
Hugging Face Transformers
supportsModelType

Please wait…