Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🗄️ Web Datasets
Common Crawl, Corpus, Training data, Web scraping
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
181864
posts in
65.6
ms
Data
Mixing
for Large Language Models
Pretraining
: A Survey and Outlook
🔤
Tokenization
arxiv.org
·
4d
Training Data is Still an Open Problem
✨
Gemini
andrej.xyz
·
2d
Perufitlife/multi-scraper-mcp
: 12 web scraping tools as MCP server for AI agents (Claude Desktop, ChatGPT, Cursor). Reddit, Amazon, eBay, Google Maps, Yelp, YouTube, TikTok, Indeed,
Trustpilot
, contact finder, SaaS pricing.
💳
Content Monetization
github.com
·
16h
·
DEV
Plexus
: A WiFi Graph RAG for Network
Troubleshooting
🔥
Prometheus
app.plexus.pw
·
12h
·
DEV
The week that Meta
employees
became
training data
👁️
Surveillance Capitalism
platformer.news
·
2d
Distilling
YouTube Into a
Queryable
Graph
🔍
Information Retrieval
jaime.win
·
3d
What holds AI safety together?
Co-authorship
networks from 200
papers
🛡️
AI Safety
lesswrong.com
·
1d
Tesla’s ‘Clean’ Lithium Supply Faces Questions After Toxic
Metals
Found in
Wastewater
💎
Critical Minerals
autoblog.com
·
6h
As agentic AI pushes rivals to raise prices and cap
usage
, Deepseek
ships
a good-enough model for almost nothing
🇨🇳
Chinese AI
the-decoder.com
·
1d
Pragmata
: Where to Find All Training Data
✨
Gemini
hardcoregamer.com
·
3d
AI Without Canada: Why the
Heritage
Committee
’s AI Report Could Lead to Less Canadian Content in the Training Data
💳
Content Monetization
michaelgeist.ca
·
1d
SAIT-EMA: A
Tridimensional
Electromagnetic
Articulography
Database for Mandarin with Diverse Language Backgrounds
🔤
Tokenization
nature.com
·
3d
Language Generation in the
Limit
💻
Programming languages
openreview.net
·
1d
5% annual increase in
SIP
can boost your retirement
corpus
by more than ₹83 lakh. Here's how
🏠
Home Assistant
livemint.com
·
1d
AI’s New Training Data: Your Old Work
Slacks
And
Emails
🆕
New AI
bespacific.com
·
5d
80% of the world doesn't think in
English
. Are you building AI for them? (
Sponsor
)
🤖
AI
go.welodata.ai
·
2d
mtmn/corpus
: self-hosted
listenbrainz
and last.fm frontend
⛰
Alpine.js
github.com
·
6d
·
Lobsters
Automated
Deanonymization
is Here
🕷️
Web Crawling
jefftk.com
·
4d
Association Is Not Similarity: Learning
Corpus-Specific
Associations
for Multi-Hop Retrieval
🔍
SPLADE
arxiv.org
·
1d
AI
providers
have millions of agent
sessions
. The first 1,589 are public.
💳
AI Commerce
danielvanstrien.xyz
·
4d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help