Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability
arxiv.org·1d
🧠Machine Learning
Preview
Report Post

View PDF HTML (experimental)

Abstract:Neural networks achieve remarkable performance through superposition: encoding multiple features as overlapping directions in activation space rather than dedicating individual neurons to each feature. This challenges interpretability, yet we lack principled methods to measure superposition. We present an information-theoretic framework measuring a neural representation’s effective degrees of freedom. We apply Shannon entropy to sparse autoencoder activations to compute the number of effective features as the minimum neurons needed for interference-free encoding. Equivalently, this measures how many "virtual neurons" the network simulates through superposition. When …

Similar Posts

Loading similar posts...