Mech Interp

mechanistic interpretability, circuits, superposition, feature visualization, AI interpretability

Feeds to Scour
SubscribedAll
Scoured 53 posts in 8.6 ms

A Unifying Framework for Concept-Based Representational Similarity

 🔍Interpretability  Content type: Academic
arxiv.org·

Inside the LLM Word Factory

 🔧Compilers  Content type: Academic
arxiv.org·

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

 🔣Category Theory  Content type: Academic
arxiv.org·

What's in a Name? Morphological Shortcuts by LLMs in Pharmacology

 🔧Compilers  Content type: Academic
arxiv.org·

DiffoR: A Unified Continuous Generative Framework for Universal Ordinal Regression

 🛡️AI Safety  Content type: Academic
arxiv.org·

The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model

 🔍Interpretability  Content type: Academic
arxiv.org·

When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following

 🛡️AI Safety  Content type: Academic
arxiv.org·

Beyond Text Following: Repairable Arbitration Reversals in Audio-Language Models

 🌐Distributed Systems  Content type: Academic
arxiv.org·

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

 🔍Interpretability  Content type: Academic
arxiv.org·

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

 🛡️AI Safety  Content type: Academic
arxiv.org·

Wearable Single-Lead ECG Detects Fine-Grained Structural Heart Disease Through Echo-Report Supervision

 🔍Interpretability  Content type: Academic
arxiv.org·

Arithmetic Pedagogy for Language Models

 🔍Interpretability  Content type: Academic
arxiv.org··Hacker News

Unsupervised Pattern Analysis in Japanese Veterinary Toxicology: A Regulatory-Compliant Framework for Cross-Species Risk Assessment

 🛡️AI Safety  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help