🔍 AI Interpretability - justjcullen · Scour

Mechanistic Interpretability: Solving the Agentic AI Trust Wall

🤖Machine Learning Discussion

bradenkelley.com·

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

🤖Machine Learning Academic

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

🕸Knowledge Graphs Academic

Exploration of a DNA Sequencing Basecaller using Activation Patching

⚡Incremental Computation

lesswrong.com·

#5: Advertising is broken. It’s time to move your brand inside the model.

🤖Machine Learning

Geometric Foundations of AI Interpretability

∘Category Theory

psychologyinaction.org·

The technical community can't be the main character in AI safety anymore

substackcdn.com··Substack

Don't let the LLM speak, just probe it (8 minute read)

🤖Machine Learning Blog

blog.j11y.io··Hacker News·Cited by 1 article

Less-relevant results

The Bitter Lesson for Biology — Adam Green on Virtual Cells and Scaling Laws

🤖Software Engineering, AI, Personal Knowledge Mangement, Strongly Typed Languages, Math, Abstractions, Data Models, Event Sourcing News

letter.nikomc.com··Hacker News

Is the Space Pope Reptilian?

🌀Complexity Science News

tearsinrain.ai··Hacker News

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

🤖Machine Learning Academic

When Emotion Descriptors Fail: AI-Native Functions of Emotion Vectors

🗃️Zettelkasten

lesswrong.com·

scMTG reconstructs single-cell temporal dynamics with Markov transition generators

⚡Incremental Computation Academic

Seven big ideas from 7x7

🤖Software Engineering, AI, Personal Knowledge Mangement, Strongly Typed Languages, Math, Abstractions, Data Models, Event Sourcing

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

🤖Machine Learning Academic

Interpretable enzyme function prediction via sparse autoencoder features of ESMC across the microbial protein universe

🚀MLOps Academic

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models

🤖Machine Learning Academic

Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

👁️Observability Academic

FoldSAE: Learning to Steer Protein Folding Through Sparse Representations

🤖Machine Learning Academic

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

⚡Incremental Computation Academic

Log in to enable infinite scrolling