🔬 Mech Interp - taylor · Scour

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

🔍Interpretability Academic

Mechanistic Interpretability: The Key to Trusting Agentic AI

🔍Interpretability Discussion

bradenkelley.com·

[Paper] Dictionary Learning Identifiability for Understanding SAEs

🔍Interpretability

lesswrong.com·

Compositional and interpretable representation of histology using AI foundation models and sparse autoencoders

🔍Interpretability Academic

FoldSAE: Learning to Steer Protein Folding Through Sparse Representations

🔍Interpretability Academic

Playing with Vision Embeddings

🔍Interpretability

prestonbjensen.com··Hacker News

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

🔍Interpretability Academic

Interactions Between Crosscoder Features: A Compact Proofs Perspective

🔍Interpretability Academic

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

🔍Interpretability Academic

Less-relevant results

BioByte 162: The Hype of Virtual Cells, ESMC's AlphaFold3-Like Performance, and the Prediction of Antibody Non-Specificity

🔍Interpretability Blog

decodingbio.substack.com··Substack

Is the Space Pope Reptilian?

🛡️AI Safety News

tearsinrain.ai··Hacker News

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

🔍Interpretability Academic

SAE It Across Models: Explaining Features With Foreign NLA Verbalizers

🔍Interpretability

lesswrong.com·

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

🔍Interpretability Academic

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

🔍Interpretability Academic

Pre-Intervention Prediction of Sparse Autoencoder Steering Side Effects

🔍Interpretability Academic

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models

🔍Interpretability Academic

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

🔍Interpretability Academic

scMTG reconstructs single-cell temporal dynamics with Markov transition generators

🔍Interpretability Academic

Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

🔍Interpretability Academic

Log in to enable infinite scrolling