YaRN: Efficient Context Window Extension of Large Language Models (opens in new tab)

Covered by 3 sources including KDnuggets, huggingface.co

arXiv:2309.00071v3 Announce Type: replace-cross Abstract: Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Covered in 3 articles

KDnuggets·

5 Small Language Models for Agentic Tool Calling

huggingface.co·

unsloth/Qwen3-8B-GGUF

aleksagordic.com·

Inside the Transformer: The Life of a Token

Discussed on Hacker News