AI Interpretability

Feeds to Scour
SubscribedAll
Scoured 52 posts in 6.5 ms

Mechanistic Interpretability: The Key to Trusting Agentic AI

 🛡️AI Safety  Content type: Discussion
bradenkelley.com·

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

 🗄️Vector Databases  Content type: Academic
arxiv.org·

Compositional and interpretable representation of histology using AI foundation models and sparse autoencoders

 🔬AI Research  Content type: Academic
biorxiv.org·

[Paper] Dictionary Learning Identifiability for Understanding SAEs

 🔬AI Research
lesswrong.com·

Playing with Vision Embeddings

 🔬AI Research

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

 🛡️AI Safety  Content type: Academic
arxiv.org·

Coelho Mollo and Millière: The Vector Grounding Problem

 🔬AI Research
Less-relevant results

BioByte 162: The Hype of Virtual Cells, ESMC's AlphaFold3-Like Performance, and the Prediction of Antibody Non-Specificity

 🔬AI Research  Content type: Blog

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

 🛡️AI Safety  Content type: Academic
arxiv.org·

Who Elected Anthropic?

 🛡️AI Safety  Content type: Blog

The technical community can't be the main character in AI safety anymore

 🛡️AI Safety
substackcdn.com··Substack

FoldSAE: Learning to Steer Protein Folding Through Sparse Representations

 🔬AI Research  Content type: Academic
arxiv.org·

scMTG reconstructs single-cell temporal dynamics with Markov transition generators

 🌐Open Source  Content type: Academic
biorxiv.org·

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models

 🧠LLMs  Content type: Academic
arxiv.org·

Thoughts on 'Learning Mechanics'

 🔬AI Research
lesswrong.com·

Interactions Between Crosscoder Features: A Compact Proofs Perspective

 🛡️AI Safety  Content type: Academic
arxiv.org·

Pre-Intervention Prediction of Sparse Autoencoder Steering Side Effects

 🧠LLMs  Content type: Academic
arxiv.org·

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

 🧠Language Models  Content type: Academic
arxiv.org·

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

 🔬AI Research  Content type: Academic
arxiv.org·

SAE It Across Models: Explaining Features With Foreign NLA Verbalizers

 🧠LLMs
lesswrong.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help