Transformer: Attention History May Matter (opens in new tab)

Transformer has shown to be a very effective finding to solve numerous learning tasks for various application fields, such as the image captioning task, which this work will focus on. Its widespread success is owed to two main ingredients: 1) an attention mechanism and 2) positional encoding. This article is interested in the first ingredient, showing that the vanilla attention mechanism may be improved by exploiting not only the context conveyed in a data sequence under analysis, but also in...

Read the original article