Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🤖 Transformers
Specific
Attention Mechanism, Self-Attention, BERT, Architecture
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
6138
posts in
17.3
ms
Preserving Knowledge in Large Language Model with
Model-Agnostic
Self-Decompression
💬
LLMs
arxiv.org
·
1d
The Spectral Geometry of Thought: Phase Transitions, Instruction
Reversal
, Token-Level Dynamics, and Perfect
Correctness
Prediction in How Transformers Reason
💡
AI Reasoning
arxiv.org
·
5d
Forget, Then Recall:
Learnable
Compression and Selective Unfolding via
Gist
Sparse Attention
🗜️
Compressed Sensing
arxiv.org
·
1d
Towards
Intrinsic
Interpretability
of Large Language Models:A Survey of Design Principles and Architectures
💬
LLMs
arxiv.org
·
5d
How Much Is One
Recurrence
Worth? Iso-Depth Scaling Laws for
Looped
Language Models
🧠
LLM
arxiv.org
·
1d
Evaluating Post-hoc
Explanations
of the Transformer-based Genome Language Model
DNABERT-2
🧠
LLM
arxiv.org
·
1d
Dual Triangle Attention: Effective
Bidirectional
Attention Without
Positional
Embeddings
🧠
LLM
arxiv.org
·
3d
Rethinking
Intrinsic
Dimension
Estimation in Neural Representations
🔥
PyTorch
arxiv.org
·
2d
Expert
Upcycling
: Shifting the Compute-Efficient Frontier of
Mixture-of-Experts
🔥
PyTorch
arxiv.org
·
2d
Closing the Theory-Practice Gap in
Spiking
Transformers via Effective
Dimension
📐
Optimization Theory
arxiv.org
·
5d
Dimensional Criticality at
Grokking
Across
MLPs
and Transformers
🧠
LLM
arxiv.org
·
4d
Sessa
:
Selective
State Space Attention
🤖
AI
arxiv.org
·
4d
FOCAL-Attention
for
Heterogeneous
Multi-Label Prediction
💬
LLMs
arxiv.org
·
3d
ESsEN
: Training Compact
Discriminative
Vision-Language Transformers in a Low-Resource Setting
💬
LLMs
arxiv.org
·
4d
Detecting
Hallucinations
in
SpeechLLMs
at Inference Time Using Attention Maps
💬
LLMs
arxiv.org
·
3d
Gradient-Based Program Synthesis with
Neurally
Interpreted
Languages
🧠
LLM
arxiv.org
·
3d
Defragmenting
Language Models: An
Interpretability-based
Approach for Vocabulary Expansion
💬
LLMs
arxiv.org
·
4d
Sketching
the
Readout
of Large Language Models for Scalable Data Attribution and Valuation
💬
LLMs
arxiv.org
·
5d
Improving Reasoning Capabilities in Small Models through
Mixture-of-Layers
Distillation with
Stepwise
Attention on Key Information
💡
AI Reasoning
arxiv.org
·
5d
SafeAnchor
: Preventing
Cumulative
Safety Erosion in Continual Domain Adaptation of Large Language Models
💬
LLMs
arxiv.org
·
4d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help