Does RoPE Prevent or Degrade Retrieval Heads? A Mechanistic Analysis Across Model Families (opens in new tab)
Retrieval heads, attention heads that copy information from earlier context to the current position, have been proposed as the mechanistic substrate for long-context recall. Rotary position embeddings (RoPE) rotate queries and keys by frequencies decaying with a base hyperparameter theta, and a natural hypothesis is that this rotation either prevents retrieval heads from forming or degrades their function. We test both across four open-weight ...
Read the original article