🔍 AI Interpretability - laurynas · Scour

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

🎯Reranking Academic

rentruewang/inversql: Create SQL that match your selection (with explainable AI), not the other way around

🗃️databases Code

github.com··Hacker News

Is the Space Pope Reptilian?

🎛️interfaces News

tearsinrain.ai··Hacker News

Playing with Vision Embeddings

🔍Semantic Search

prestonbjensen.com··Hacker News

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

🎯Reranking Academic

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

📍embeddings Academic

Interactions Between Crosscoder Features: A Compact Proofs Perspective

🎯Reranking Academic

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

📍embeddings Academic

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

🌫Diffusion language models Academic

Pre-Intervention Prediction of Sparse Autoencoder Steering Side Effects

🎯Reranking Academic

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

🎯Reranking Academic

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

🎯Reranking Academic

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

🎮Reinforcement Learning Academic

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

🔍Semantic Search Academic

Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

⚙Context engineering Academic

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

🎯Reranking Academic

Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers

🧪Property-based Testing Academic

Mechanistic Analysis of Alignment Algorithms in Language Models

🌫Diffusion language models Academic

Temporal Preference Concepts and their Functions in a Large Language Model

🌫Diffusion language models Academic

TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

🎯Reranking Academic

Log in to enable infinite scrolling