Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🗄️ Web Datasets
Common Crawl, Corpus, Training data, Web scraping
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
28984
posts in
89.9
ms
Data
Mixing
for Large Language Models
Pretraining
: A Survey and Outlook
🔤
Tokenization
arxiv.org
·
4d
Training Data is Still an Open Problem
✨
Gemini
andrej.xyz
·
2d
What holds AI safety together?
Co-authorship
networks from 200
papers
🛡️
AI Safety
lesswrong.com
·
19h
Using a Local LLM as a Zero-Shot
Classifier
🔤
Tokenization
towardsdatascience.com
·
2d
Language Generation in the
Limit
💻
Programming languages
openreview.net
·
23h
mtmn/corpus
: self-hosted
listenbrainz
and last.fm frontend
⛰
Alpine.js
github.com
·
6d
·
Lobsters
Assembling
450 Billion
Tokens
: The Training Data Nobody Had Ready
🔤
Tokenization
pub.towardsai.net
·
2d
The week that Meta
employees
became
training data
👁️
Surveillance Capitalism
platformer.news
·
1d
Automated
Deanonymization
is Here
🕷️
Web Crawling
jefftk.com
·
4d
AI
providers
have millions of agent
sessions
. The first 1,589 are public.
💳
AI Commerce
danielvanstrien.xyz
·
4d
“No modern American city has ever run out of water. But chances are rising that
Corpus
Christi
could be the first.”
💧
Water Infrastructure
kut.org
·
1d
·
Hacker News
AI’s New Training Data: Your Old Work
Slacks
And
Emails
🆕
New AI
forbes.com
·
6d
·
Hacker News
,
r/privacy
Habeas
Corpus
Cases, Twitter, Ukraine Cultural Heritage, More: Monday ResearchBuzz, April 20, 2026
📰
RSS Reading Practices
researchbuzz.me
·
5d
Association Is Not Similarity: Learning
Corpus-Specific
Associations
for Multi-Hop Retrieval
🔍
SPLADE
arxiv.org
·
1d
How I run
distributed
Rust
fuzzing
in GitHub Actions
🦀
Rust Web Services
depot.dev
·
3d
Datasets -
UCI
Machine Learning
Repository
📊
Vector Databases
archive.ics.uci.edu
·
12h
Report: Meta will train AI agents by tracking employees'
mouse
,
keyboard
use
🆕
New AI
arstechnica.com
·
4d
Embeddings
&
Vector
Search
🎯
Vector Search
taoofmac.com
·
14h
Machine learning and digital
pragmatics
: Which word category
influences
emoji use most?
🔤
Tokenization
arxiv.org
·
1d
AI
scouts
for
journalists
📰
RSS Reading Practices
cojournalist.ai
·
14h
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help