Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🤖 Transformer Architecture
Specific
Attention, BERT, GPT, Sequence Models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
171386
posts in
31.3
ms
Attention With
Actual
Numbers
🤖
Transformers
pub.towardsai.net
·
2h
Task
Bert
🤖
Transformers
producthunt.com
·
6d
Paper page - Attention Sink in Transformers: A Survey on
Utilization
,
Interpretation
, and Mitigation
🤖
Transformers
huggingface.co
·
10h
BERT
(2018)
🤖
Transformers
litecoder.medium.com
·
2d
amitshekhariitbhu/llm-internals
: Learn LLM
internals
step by step - from tokenization to attention to inference optimization.
🤖
LLM Inference
github.com
·
1d
·
Hacker News
Fine-tuning
a
Summarization
model
🔤
Tokenization
sakhawathossenofficial.medium.com
·
16h
Fine-tuning a
DistilBERT
classifier
with numerical and text inputs
🔤
Tokenization
engineering.freeagent.com
·
4d
All in One for AI
Chatbot
✍️
Prompt Engineering
nottoai.com
·
1d
·
Hacker News
Implementing DeepSeek-V2’s Multi-Head Latent Attention (
MLA
) from Scratch in
PyTorch
🔥
PyTorch
medium.com
·
1d
GCA-DETR
: Global-context-aware-based detection transformer
👁️
Computer Vision
sciencedirect.com
·
4d
The
Lyra
Technique: Cognitive Geometry in Transformer KV-Caches — From
Metacognition
to Misalignment Detection
🧠
Stacked PKM
zenodo.org
·
5d
·
r/artificial
A single-layer, single-head neural transformer
written
in
PDP-11
assembly language
🧠
Neuromorphic Computing
blog.adafruit.com
·
6d
Neural Networks
🧠
Deep Learning
rlj0713.medium.com
·
4d
Low-Rank Key Value Attention: Reducing
KV
Cache Memory and
Maintaining
Head Diversity
⚡
Quantization
fin.ai
·
5d
·
Hacker News
Understanding
BERTopic
: From Raw Text to Interpretable
Topics
🤖
Transformers
analyticsvidhya.com
·
3d
milanm/AutoGrad-Engine
: A complete GPT language model (training and inference) in ~600 lines of pure C#, zero dependencies
🧠
LLM
github.com
·
5d
·
Hacker News
Detecting Translation
Hallucinations
with Attention
Misalignment
🧠
LLM
towardsdatascience.com
·
6d
tmaselko/paper-attncap
: Repository associated with the "Separate and Amplify: Attention's Geometry of Retrieval" paper. Contains TSAR synthetic task, minimal model, training/repro code, and chart/table generation.
🤖
Transformers
github.com
·
6d
·
Hacker News
Neural Networks for Language: How Context
Became
a Learned
Transformation
🧠
LLM
pub.towardsai.net
·
4d
SPUTNIKAI/LeechTransformer
: Leech-Lila: A Geometric Attention Transformer(Language Model) with the Leech Lattice Attention
🤖
LLM Inference
github.com
·
6d
·
Hacker News
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help