Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
馃 Transformer Architecture
Specific
Attention, BERT, GPT, Sequence Models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
200134
posts in
21.9
ms
A deep
dive
into the
Transformer
architecture
聽
馃
LLM Reasoning
blog.algomaster.io
路
5d
Attention
in
transformers
, step-by-step | Deep ...
聽
馃攳
Vector Search
3blue1brown.com
路
18h
Towards
Generalization
of Block Attention via Automatic Segmentation and Block
Distillation
聽
馃
Deep Learning
arxiv.org
路
1d
AI Paper Review: Language Models are Few-Shot
Learners
(
GPT-3
)
聽
馃挰
Prompt Engineering
freecodecamp.org
路
11h
needle/docs/simple
_attention_
networks.md
at main
聽
馃
Local LLMs
github.com
路
2d
Explainable
AI:
Visualizing
Attention in Transformers
聽
馃挰
Natural Language Processing
mlops.community
路
5d
The usual
implementaiton
of attention transformers (
SDPA
) is kind of bad, actually
聽
馃敘
Kolmogorov Complexity
gist.github.com
路
1d
路
Hacker News
AI 101: Your Ultimate Guide to Attention: Mechanism,
QKV
, and
KV
Cache
聽
馃挰
Prompt Engineering
turingpost.com
路
5d
Tracing
Attention
Computation
Through Feature Interactions
聽
馃挰
Prompt Engineering
transformer-circuits.pub
路
4d
SymbioNet
: Neuro-symbolic learning with morphological attention for interpretable acute
lymphoblastic
leukemia classification
聽
馃攳
Vector Search
sciencedirect.com
路
4d
Think In Diffusion:
Continuous
Latent
Diffusion Language Model
聽
馃幁
Anthropic Claude
mail.bycloud.ai
路
6d
DashAttention
:
Differentiable
and Adaptive Sparse Hierarchical Attention
聽
馃尡
Stemming
arxiv.org
路
3h
Grokking
as Structural Inference:
Transformers
Need Bayesian Lottery Tickets
聽
馃敘
Kolmogorov Complexity
arxiv.org
路
1d
One Model, Two Roles: Emergent
Specialization
in a Shared
Recurrent
Transformer
聽
馃
Symbolic AI
arxiv.org
路
3h
InfoFlow
: A Framework for Multi-Layer
Transformer
Analysis
聽
馃敘
Kolmogorov Complexity
arxiv.org
路
3h
HEED
: Density-Weighted
Residual
Alignment for Hybrid Vision-Language Model Distillation
聽
馃敆
RAG
arxiv.org
路
3h
Attention
Dispersion
in Dynamic Graph Transformers: Diagnosis and a
Transferable
Fix
聽
馃攳
Vector Search
arxiv.org
路
1d
From
BERT
to
T5
: A Study of Named Entity Recognition
聽
馃摑
TextRank
arxiv.org
路
3h
Parallel
Recursive
LSTM
聽
馃敆
RAG
arxiv.org
路
3h
Transformer
Scalability
Crisis: The First Comprehensive
Empirical
Analysis of Performance Walls in Modern Language Models
聽
馃敘
Kolmogorov Complexity
arxiv.org
路
1d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help