AI Interpretability

Feeds to Scour
SubscribedAll
Scoured 66 posts in 7.0 ms

Mechanistic Interpretability: The Key to Trusting Agentic AI

 ✍️Prompt Engineering  Content type: Discussion
bradenkelley.com·

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

 LLM Optimization  Content type: Academic
arxiv.org·

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

 LLM Optimization  Content type: Academic
biorxiv.org·

Mythos and the Adolescence of AI Policy

 ✍️Prompt Engineering  Content type: News
luizasnewsletter.com·

Silicon Valley found AI and started looking for God

 💻Tech  Content type: News

The Rival Theologies of Artificial Intelligence

 ✍️Prompt Engineering  Content type: News
Less-relevant results

Don't let the LLM speak, just probe it (8 minute read)

 🤖AI  Content type: Blog
blog.j11y.io·

[Paper] Dictionary Learning Identifiability for Understanding SAEs

 LLM Optimization
lesswrong.com·

Is the Space Pope Reptilian?

 ✍️Prompt Engineering  Content type: News

The Calculated Spectacle Behind Magnifica Humanitas

 ✍️Prompt Engineering
firstthings.com·

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

 LLM Optimization  Content type: Academic
arxiv.org·

Playing with Vision Embeddings

 Model Efficiency

Best explanations of how LLMs work

 LLM Optimization  Content type: Blog

FoldSAE: Learning to Steer Protein Folding Through Sparse Representations

 LLM Optimization  Content type: Academic
arxiv.org·

The technical community can't be the main character in AI safety anymore

 🔓Hacking
substackcdn.com··Substack

Compositional and interpretable representation of histology using AI foundation models and sparse autoencoders

 LLM Optimization  Content type: Academic
biorxiv.org·

Interactions Between Crosscoder Features: A Compact Proofs Perspective

 LLM Optimization  Content type: Academic
arxiv.org·

Coelho Mollo and Millière: The Vector Grounding Problem

 🤖AI

SAE It Across Models: Explaining Features With Foreign NLA Verbalizers

 LLM Optimization
lesswrong.com·

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models

 LLM Optimization  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help