GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs (opens in new tab)
Activation steering controls model behavior by modifying intermediate hidden states at inference time without retraining. Existing methods handle only single-direction injection; when multiple semantic directions are superposed without constraints, the model collapses. We show that this collapse decomposes into two independently acting sources: distributional deviation, where additive perturbations accumulate in norm across layers and drive acti...
Read the original article