🤖 Transformers - yfff · Scour

Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression 💬LLMs

The Spectral Geometry of Thought: Phase Transitions, Instruction Reversal, Token-Level Dynamics, and Perfect Correctness Prediction in How Transformers Reason 💡AI Reasoning

Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention 🗜️Compressed Sensing

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures 💬LLMs

How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models 🧠LLM

Evaluating Post-hoc Explanations of the Transformer-based Genome Language Model DNABERT-2 🧠LLM

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings 🧠LLM

Rethinking Intrinsic Dimension Estimation in Neural Representations 🔥PyTorch

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts 🔥PyTorch

Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension 📐Optimization Theory

Dimensional Criticality at Grokking Across MLPs and Transformers 🧠LLM

Sessa: Selective State Space Attention 🤖AI

FOCAL-Attention for Heterogeneous Multi-Label Prediction 💬LLMs

ESsEN: Training Compact Discriminative Vision-Language Transformers in a Low-Resource Setting 💬LLMs

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps 💬LLMs

Gradient-Based Program Synthesis with Neurally Interpreted Languages 🧠LLM

Defragmenting Language Models: An Interpretability-based Approach for Vocabulary Expansion 💬LLMs

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation 💬LLMs

Improving Reasoning Capabilities in Small Models through Mixture-of-Layers Distillation with Stepwise Attention on Key Information 💡AI Reasoning

SafeAnchor: Preventing Cumulative Safety Erosion in Continual Domain Adaptation of Large Language Models 💬LLMs

Log in to enable infinite scrolling