A Few Notes on the Transformer (opens in new tab)

A self-attention block depicted as a neural network. In this post I will describe the attention mechanism, commonly used in transformers, a popular neural language architecture. Most of the most well-known large language models of late are based on the transformer architecture. Attention was first described in Attention is All You Need by Vaswani et al. What is attention? At a high level, attention is a mechanism for neural networks to boost portions of an input which are relevant and ignore ...

Read the original article