Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
Salesforce Marketing Cloud
OCTOBER 28, 2024
At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel, so that extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K.
Let's personalize your content