🔍 Interpretability - Bingran · Scour

Mechanistic Interpretability: The Key to Trusting Agentic AI

🤖AI Agents Discussion

bradenkelley.com·

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

🧠AI Research Academic

You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them

lesswrong.com·

Compositional and interpretable representation of histology using AI foundation models and sparse autoencoders

📉Deep Learning Academic

Is the Space Pope Reptilian?

🔄Transformers News

tearsinrain.ai··Hacker News

Can You Hide From a Natural Language Autoencoder?

⚙️Model Training Blog

yogesh.bearblog.dev·

mingusb/transformer-golf: The Fully Unrolled Transformer: An experimental repository for architecture simplification and compilation. [2026]

📉Deep Learning Code

github.com··Hacker News

Arithmetic Without Numbers – How LLMs Do Math

alvaro-videla.com··Hacker News

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

💬LLMs Academic

How LLMs work | Practical Leaders

practical-leaders.com··Hacker News

Playing with Vision Embeddings

📐Scaling Laws

prestonbjensen.com··Hacker News

The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics

🖥️ML Systems Academic

BioByte 162: The Hype of Virtual Cells, ESMC's AlphaFold3-Like Performance, and the Prediction of Antibody Non-Specificity

🖥️ML Systems Blog

decodingbio.substack.com··Substack

Machinic Psychopharmacology: Do LLMs Self-Medicate?

⚙️Model Training

lesswrong.com··Hacker News

Coelho Mollo and Millière: The Vector Grounding Problem

⚙️Model Training

philosophyofbrains.com·

Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

🧠AI Research Academic

How LLMs Actually Work: A Friendly Map for Humans • oreoro

🔄Transformers

oreoro.github.io··Hacker News

princezuda/-RequiemGPT-: Fully open source and open weights built and trained by fable five with one prompt. An experience in how AI actually works

🔥PyTorch Code

github.com··Hacker News

scMTG reconstructs single-cell temporal dynamics with Markov transition generators

📐Scaling Laws Academic

Trajectory Geometry of Transformer Representations Across Layers

🔄Transformers Academic

Log in to enable infinite scrolling