🔬 Interpretability - sunzhongxiang · Scour

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

🧠Cognitive Neurosciens for AI Academic

The technical community can't be the main character in AI safety anymore

substackcdn.com··Substack

scMTG reconstructs single-cell temporal dynamics with Markov transition generators

💾Memory Systems Academic

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

🧠Cognitive Neurosciens for AI Academic

Less-relevant results

AI-augmented coaching platform specifically for dissertation/thesis students

🎯Alignment Discussion

dissertationcoach.ai··r/SideProject

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

🔍RAG Academic

SAE It Across Models: Explaining Features With Foreign NLA Verbalizers

🎨Multimodal AI

lesswrong.com·

Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

🧠Neuroscience Academic

Who Elected Anthropic?

🎯Alignment Blog

vizierprime.substack.com··Substack

Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers

🧠Cognitive Neurosciens for AI Academic

Interactions Between Crosscoder Features: A Compact Proofs Perspective

🎨Multimodal AI Academic

FoldSAE: Learning to Steer Protein Folding Through Sparse Representations

💾Memory Systems Academic

Mechanistic Analysis of Alignment Algorithms in Language Models

🎯Alignment Academic

Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models

🦾Embodied AI Academic

A Deployment-Oriented Framework for Explainable AI-Assisted eBPF/XDP Mitigation at the IoT Edge

🎯Alignment Academic

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

🧠Cognitive Neurosciens for AI Academic

Pre-Intervention Prediction of Sparse Autoencoder Steering Side Effects

🧠Cognitive Neurosciens for AI Academic

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

🌀Hallucination Academic

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

🎨Multimodal AI Academic

Inside the LLM Word Factory

💾Memory Systems Academic

Log in to enable infinite scrolling