🔎 AI Interpretability - jinkai_lau · Scour

Mechanistic Interpretability: The Key to Trusting Agentic AI

🛡️AI Safety Discussion

bradenkelley.com·

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

🗄️Vector Databases Academic

Compositional and interpretable representation of histology using AI foundation models and sparse autoencoders

🔬AI Research Academic

[Paper] Dictionary Learning Identifiability for Understanding SAEs

🔬AI Research

lesswrong.com·

Playing with Vision Embeddings

🔬AI Research

prestonbjensen.com··Hacker News

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

🛡️AI Safety Academic

Coelho Mollo and Millière: The Vector Grounding Problem

🔬AI Research

philosophyofbrains.com·

Less-relevant results

BioByte 162: The Hype of Virtual Cells, ESMC's AlphaFold3-Like Performance, and the Prediction of Antibody Non-Specificity

🔬AI Research Blog

decodingbio.substack.com··Substack

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

🛡️AI Safety Academic

Who Elected Anthropic?

🛡️AI Safety Blog

vizierprime.substack.com··Substack

The technical community can't be the main character in AI safety anymore

🛡️AI Safety

substackcdn.com··Substack

FoldSAE: Learning to Steer Protein Folding Through Sparse Representations

🔬AI Research Academic

scMTG reconstructs single-cell temporal dynamics with Markov transition generators

🌐Open Source Academic

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models

🧠LLMs Academic

Thoughts on 'Learning Mechanics'

🔬AI Research

lesswrong.com·

Interactions Between Crosscoder Features: A Compact Proofs Perspective

🛡️AI Safety Academic

Pre-Intervention Prediction of Sparse Autoencoder Steering Side Effects

🧠LLMs Academic

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

🧠Language Models Academic

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

🔬AI Research Academic

SAE It Across Models: Explaining Features With Foreign NLA Verbalizers

lesswrong.com·

Log in to enable infinite scrolling