🔍 Interpretability - Bingran · Scour

Dual-Stance Evaluation of Sycophancy: The Structure of Agreement and the Limits of Intervention

💬LLMs Academic

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

💬LLMs Academic

TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

💬LLMs Academic

MechLens: Late Crystallization of Factual Knowledge Explains Intervention Effectiveness in Language Models

💬LLMs Academic

A Unifying Framework for Concept-Based Representational Similarity

🔄Transformers Academic

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

🧠AI Research Academic

Set-Based Transformer for Atmospheric Compensation in Standoff LWIR Hyperspectral Imaging

📐Scaling Laws Academic

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

⚙️Model Training Academic

Vision-Language Asymmetry in Bistable Image Captioning

🧠AI Research Academic

The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models

🧠AI Research Academic

The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model

💬LLMs Academic

Unsupervised Pattern Analysis in Japanese Veterinary Toxicology: A Regulatory-Compliant Framework for Cross-Species Risk Assessment

📐Scaling Laws Academic

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

💬LLMs Academic

WAV: Multi-Resolution Block Residual Routing for Deep Decoder-Only Transformers

🔄Transformers Academic

TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models

⚙️Model Training Academic

Consistency Training Along the Transformer Stack

💬LLMs Academic

Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs

⚙️Model Training Academic

Localizing Prompt Ambiguity in Large Language Models with Probe-Targeted Attribution

💬LLMs Academic

Interpreting Brain Responses to Language with Sparse Features from Language Models

💬LLMs Academic

Dead Directions: Geometric Singular Learning

🎮Reinforcement Learning Academic

No more posts from Bingran's subscribed feeds.

Scour all 25258 feeds Learn more about Feeds

Log in to enable infinite scrolling