馃攧 TransformersSpecificAttention Mechanisms, Large Language Models, BERT, Encoder-Decoder Architecture