Image Caption Generation With Adaptive Transformer

EasyChair Preprint 1046

6 pages•Date: May 28, 2019

Wei Zhang, Wenbo Nie, Xinle Li and Yao Yu

Abstract

Encoder-decoder framework based image caption has made promising progress. The application of various attention mechanisms has also greatly improved the performance of the caption model. Improving the performance of every part of the framework or employ more effective attention mechanism will benefit the eventual performance. Based on this idea we make improvements in two aspects. Firstly we use more powerful decoder. Recent work shows that Transformer is superior in efficiency and performance to LSTM in some NLP tasks, so we use Transformer to substitute the traditional decoder LSTM to accelerate the training process. Secondly we combine the spatial attention and adaptive attention into Transformer, which makes decoder to determine where and when to use image region information. We use this method to experiment on the Flickr30k dataset and achieve better results.

Keyphrases: Adaptive Attention, image caption, transformer

Links:

https://easychair.org/publications/preprint/LPLX

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:1046,
  author    = {Wei Zhang and Wenbo Nie and Xinle Li and Yao Yu},
  title     = {Image Caption Generation With Adaptive Transformer},
  howpublished = {EasyChair Preprint 1046},
  year      = {EasyChair, 2019}}

Download PDF Open PDF in browser