AI Interpretability

Feeds to Scour
SubscribedAll
Scoured 35 posts in 13.3 ms

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

 🎯Reranking  Content type: Academic
arxiv.org·

rentruewang/inversql: Create SQL that match your selection (with explainable AI), not the other way around

 🗃️databases  Content type: Code
github.com··Hacker News

Is the Space Pope Reptilian?

 🎛️interfaces  Content type: News
tearsinrain.ai··Hacker News

Playing with Vision Embeddings

 🔍Semantic Search

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

 🎯Reranking  Content type: Academic
arxiv.org·

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

 📍embeddings  Content type: Academic
arxiv.org·

Interactions Between Crosscoder Features: A Compact Proofs Perspective

 🎯Reranking  Content type: Academic
arxiv.org·

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

 🌫Diffusion language models  Content type: Academic
arxiv.org·

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

 📍embeddings  Content type: Academic
arxiv.org·

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

 🎯Reranking  Content type: Academic
arxiv.org·

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Pre-Intervention Prediction of Sparse Autoencoder Steering Side Effects

 🎯Reranking  Content type: Academic
arxiv.org·

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

 🎯Reranking  Content type: Academic
arxiv.org·

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

 🎯Reranking  Content type: Academic
arxiv.org·

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

 🔍Semantic Search  Content type: Academic
arxiv.org·

Mechanistic Analysis of Alignment Algorithms in Language Models

 🌫Diffusion language models  Content type: Academic
arxiv.org·

Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

 Context engineering  Content type: Academic
arxiv.org·

Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers

 🧪Property-based Testing  Content type: Academic
arxiv.org·

Temporal Preference Concepts and their Functions in a Large Language Model

 🌫Diffusion language models  Content type: Academic
arxiv.org·

TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

 🎯Reranking  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help