Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🤖 Transformers
Specific
transformer model, attention mechanism, BERT, GPT architecture
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
186638
posts in
21.9
ms
Algorithm that gets ‘under the
hood
’ of AI models could effectively
steer
their responses
🤖
AI
nature.com
·
1d
Bruce
on AI Engineering
🕵️
AI Agents
heyuan110.com
·
7h
Zero-Cost Transparent
Semiotic
Awareness for Frozen Language Models
SRT-Adapter
🧠
LLM Training
sublius.substack.com
·
4d
·
Substack
The
Sequence
AI of the Week #851:
DeepSeek-V4
and the Architecture of Million-Token Intelligence
🤖
AI
substackcdn.com
·
1d
·
Substack
Soul Player
C64
– a 2-layer
decoder-only
transformer LLM
🧠
LLM Training
blog.adafruit.com
·
2d
Unveiling
the Hidden Structure of Self-Attention via Kernel
Principal
Component Analysis
🧠
LLM Training
proceedings.neurips.cc
·
5d
cauchy221/Alignment-Whack-a-Mole-Code
: The official code repo of Alignment Whack-a-Mole: Finetuning Activates
Verbatim
Recall of Copyrighted Books in Large Language Models
🔍
RAG
github.com
·
22h
·
Hacker News
Enhanced fracture network
permeability
prediction using attention mechanism and Kolmogorov-Arnold Networks with
SHAP
interpretability analysis
🤖
AI
sciencedirect.com
·
4h
LLM
Quantization
🧠
LLM Training
huggingface.co
·
1h
·
Hacker News
Zffnn
-
Comptime
Neural Network Inference Engine
🤖
AI
ziggit.dev
·
1d
Associative-State
Universal Transformers: Sparse Retrieval Meets Structured
Recurrence
🔍
RAG
arxiv.org
·
21h
Presentation: Agents, Architecture, &
Amnesia
: Becoming AI-Native Without Losing Our
Minds
🕵️
AI Agents
infoq.com
·
1d
DeepSeek
open-sources
V4
large language model series
🧠
LLM Training
siliconangle.com
·
6d
Two Heads Are Better Than One:
Async
Knowledge Injection for Speech AI with
Tandem
Architecture
🧠
LLM Training
pub.sakana.ai
·
1d
·
Hacker News
Back to
BERT
in 2026:
ModernGENA
as a Strong, Efficient Baseline for DNA Foundation Models
🧬
Genomics
biorxiv.org
·
5d
Computation in
Superposition
: Two
Handcrafted
Models
⚛️
Quantum Computing
lesswrong.com
·
1d
A
First-Principles
Theory of Slow Thinking and Active
Perception
🧩
Cognitive Science
global-sci.com
·
3d
In new Anthropic
Fellows
research, we discuss “
introspection
adapters": a tool that allows language models to self-report behaviors they've learned during train...
🧠
LLM Training
twitter.macworks.dev
·
1d
Using
Bag-of-Words
With
PyCharm
🔍
RAG
blog.jetbrains.com
·
1d
Training language models to be warm can reduce
accuracy
and increase
sycophancy
🧠
LLM Training
nature.com
·
1d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help