attention mechanism, self-attention, transformer architecture, BERT, GPT
Press ? anytime to show this help