Attention Mechanisms, Large Language Models, BERT, Encoder-Decoder Architecture