Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃 Transformers
Specific
Attention Mechanism, Self-Attention, BERT, Architecture
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
6125
posts in
13.5
ms
Transformers
聽
馃挰
LLMs
chizkidd.github.io
路
2d
路
Hacker News
ml-intern
聽
馃挰
LLMs
producthunt.com
路
3d
The
Recurrent
Transformer:
Greater
Effective Depth and Efficient Decoding
聽
馃
LLM
arxiv.org
路
1d
kyegomez/OpenMythos
: A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.
聽
馃
LLM
github.com
路
5d
路
Hacker News
Watch language models think.
聽
馃挰
LLMs
openinterp.org
路
1d
路
Hacker News
Transformer vs
CNN-LSTM
:
CWRU
Bearing 96% vs 92% Accuracy
聽
馃
LLM
tildalice.io
路
4d
Neural Networks
Explained
In
Plain
English
聽
馃
Machine Learning
blog.algomaster.io
路
4d
Working Memory Constraints
Scaffold
Learning in Transformers under Data
Scarcity
聽
馃
LLM
arxiv.org
路
2d
Variance
Is Not Importance: Structural Analysis of Transformer
Compressibility
Across Model Scales
聽
馃搻
Optimization Theory
arxiv.org
路
2d
Absorber
LLM: Harnessing Causal
Synchronization
for Test-Time Training
聽
馃挰
LLMs
arxiv.org
路
1d
Nexusformer
: Nonlinear Attention Expansion for Stable and
Inheritable
Transformer Scaling
聽
馃
LLM
arxiv.org
路
3d
Hyperloop
Transformers
聽
馃挰
LLMs
arxiv.org
路
1d
An
explicit
operator explains end-to-end
computation
in the modern neural networks used for sequence and language modeling
聽
馃
LLM
arxiv.org
路
2d
The
Topological
Trouble With
Transformers
聽
馃挰
LLMs
arxiv.org
路
4d
OThink-SRR1
: Search, Refine and Reasoning with Reinforced Learning for Large Language Models
聽
馃挰
LLMs
arxiv.org
路
2d
Tracing
Relational
Knowledge Recall in Large Language Models
聽
馃挰
LLMs
arxiv.org
路
2d
SigGate-GT
: Taming Over-Smoothing in Graph Transformers via
Sigmoid-Gated
Attention
聽
馃
LLM
arxiv.org
路
4d
Stream-CQSA
: Avoiding Out-of-Memory in Attention Computation via Flexible
Workload
Scheduling
聽
馃
LLM
arxiv.org
路
2d
The Spectral Geometry of Thought: Phase Transitions, Instruction
Reversal
, Token-Level Dynamics, and Perfect
Correctness
Prediction in How Transformers Reason
聽
馃挕
AI Reasoning
arxiv.org
路
5d
MIRROR: A Hierarchical Benchmark for
Metacognitive
Calibration
in Large Language Models
聽
馃挰
LLMs
arxiv.org
路
2d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help