AI Interpretability

Feeds to Scour
SubscribedAll
Scoured 54 posts in 8.8 ms

Mechanistic Interpretability: Solving the Agentic AI Trust Wall

 🤖Machine Learning  Content type: Discussion
bradenkelley.com·

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

 🤖Machine Learning  Content type: Academic
arxiv.org·

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

 🕸Knowledge Graphs  Content type: Academic
biorxiv.org·

Exploration of a DNA Sequencing Basecaller using Activation Patching

 Incremental Computation
lesswrong.com·

#5: Advertising is broken. It’s time to move your brand inside the model.

 🤖Machine Learning
rhizome.org·

Geometric Foundations of AI Interpretability

 Category Theory
psychologyinaction.org·

The technical community can't be the main character in AI safety anymore

 🦀Rust
substackcdn.com··Substack

Don't let the LLM speak, just probe it (8 minute read)

 🤖Machine Learning  Content type: Blog
Less-relevant results

Is the Space Pope Reptilian?

 🌀Complexity Science  Content type: News

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

 🤖Machine Learning  Content type: Academic
arxiv.org·

When Emotion Descriptors Fail: AI-Native Functions of Emotion Vectors

 🗃️Zettelkasten
lesswrong.com·

scMTG reconstructs single-cell temporal dynamics with Markov transition generators

 Incremental Computation  Content type: Academic
biorxiv.org·

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

 🤖Machine Learning  Content type: Academic
arxiv.org·

Interpretable enzyme function prediction via sparse autoencoder features of ESMC across the microbial protein universe

 🚀MLOps  Content type: Academic
arxiv.org·

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models

 🤖Machine Learning  Content type: Academic
arxiv.org·

Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

 👁️Observability  Content type: Academic
arxiv.org·

FoldSAE: Learning to Steer Protein Folding Through Sparse Representations

 🤖Machine Learning  Content type: Academic
arxiv.org·

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

 Incremental Computation  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help