🔍 AI Interpretability - hop1.ng.1357 · Scour

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

🎯Alignment Research Academic

The economics of speculative decoding

🇨🇳Chinese AI Blog

fergusfinn.com··Hacker News

princezuda/-RequiemGPT-: Fully open source and open weights built and trained by fable five with one prompt. An experience in how AI actually works

🧠Machine Learning Code

github.com··Hacker News

Is the Space Pope Reptilian?

🎯Alignment Research News

tearsinrain.ai··Hacker News

Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

🎯Alignment Research Academic

Playing with Vision Embeddings

🧠Machine Learning

prestonbjensen.com··Hacker News

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

✨Gemini Academic

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

🎯Alignment Research Academic

Trajectory Geometry of Transformer Representations Across Layers

🎯Alignment Research Academic

SAT-Physical Thermodynamic Framework: treating constraints as a thermal system

🗣️Linguistics Code

github.com··Hacker News

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

🗣️Linguistics Academic

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

🎯Alignment Research Academic

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

🎯Alignment Research Academic

Interactions Between Crosscoder Features: A Compact Proofs Perspective

🎯Alignment Research Academic

Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers

🎯Alignment Research Academic

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

🎯Alignment Research Academic

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

🎯Alignment Research Academic

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

🧠Machine Learning Academic

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models

✨Gemini Academic

RowNet: A Memory Transformer for Tabular Regression

🔢BitNet Academic

Log in to enable infinite scrolling