AI Interpretability

Feeds to Scour
SubscribedAll
Scoured 66 posts in 12.5 ms

Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Set-Based Transformer for Atmospheric Compensation in Standoff LWIR Hyperspectral Imaging

 Model Efficiency  Content type: Academic
arxiv.org·

TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

 LLM Optimization  Content type: Academic
arxiv.org·

A Unifying Framework for Concept-Based Representational Similarity

 LLM Optimization  Content type: Academic
arxiv.org·

Temporal Preference Concepts and their Functions in a Large Language Model

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Measuring a hate speech spectrum with faceted Rasch item response theory and perspective-aware, explainable-by-design deep learning

 LLM Optimization  Content type: Academic
arxiv.org·

Inside the LLM Word Factory

 LLM Optimization  Content type: Academic
arxiv.org·

The Rival Theologies of Artificial Intelligence

 ✍️Prompt Engineering
palladiummag.com·

Vision-Language Asymmetry in Bistable Image Captioning

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

 🤖AI  Content type: Academic
arxiv.org·

RGB-S: Image-Aligned Tactile Saliency for Robust Dexterous Manipulation

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models

 LLM Optimization  Content type: Academic
arxiv.org·

The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model

 LLM Optimization  Content type: Academic
arxiv.org·

DiffoR: A Unified Continuous Generative Framework for Universal Ordinal Regression

 LLM Optimization  Content type: Academic
arxiv.org·

When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

 🤖AI  Content type: Academic
arxiv.org·

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

Wearable Single-Lead ECG Detects Fine-Grained Structural Heart Disease Through Echo-Report Supervision

 📡RSS  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help