Interpretability

Feeds to Scour
SubscribedAll
Scoured 65 posts in 6.4 ms

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

馃Cognitive Neurosciens for AIContent type: Academic
arxiv.org

Self-Explainability in Self-Adaptive and Self-Organising Systems: Status and Research Directions

馃幆AlignmentContent type: Academic
arxiv.org

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

馃捑Memory SystemsContent type: Academic
arxiv.org

Decoding Naturalistic Emotion Dynamics from the Brain: An LLM-Enhanced Regression Framework

馃NeuroscienceContent type: Academic
arxiv.org

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

馃寑HallucinationContent type: Academic
arxiv.org

Shared Latent Structures Enable Unified Backdoor Detection and Mitigation in LLMs

馃捑Memory SystemsContent type: Academic
arxiv.org

Temporal Preference Concepts and their Functions in a Large Language Model

馃捑Memory SystemsContent type: Academic
arxiv.org

TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

馃帹Multimodal AIContent type: Academic
arxiv.org

AI-Native Closed-Loop Security for 6G-Enabled Cyber-Physical Systems: From Edge Detection to Network-Wide Mitigation

馃幆AlignmentContent type: Academic
arxiv.org

Explainable AI-Driven Cyber Risk Analytics and Model Reliability Assessment for Intelligent Governance of U.S. Critical Infrastructure: An XGBoost and SHAP-Based Intrusion Detection Framework

馃幆AlignmentContent type: Academic
arxiv.org

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

馃帹Multimodal AIContent type: Academic
arxiv.org

A Unifying Framework for Concept-Based Representational Similarity

馃攳RAGContent type: Academic
arxiv.org

The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models

馃Cognitive Neurosciens for AIContent type: Academic
arxiv.org

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

馃捑Memory SystemsContent type: Academic
arxiv.org

Vision-Language Asymmetry in Bistable Image Captioning

馃帹Multimodal AIContent type: Academic
arxiv.org

The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model

馃攳RAGContent type: Academic
arxiv.org

DiffoR: A Unified Continuous Generative Framework for Universal Ordinal Regression

馃幆AlignmentContent type: Academic
arxiv.org

When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following

馃捑Memory SystemsContent type: Academic
arxiv.org

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

馃Cognitive Neurosciens for AIContent type: Academic
arxiv.org

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

馃幆AlignmentContent type: Academic
arxiv.org
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help