YaRN: Efficient Context Window Extension of Large Language Models (opens in new tab)
arXiv:2309.00071v3 Announce Type: replace-cross Abstract: Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we...
Read the original article