Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Web Datasets
🗄️ Web Datasets
Common Crawl, Corpus, Training data, Web scraping
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
137
posts in
29.4
ms
Publishers push
Common
Crawl
to stop collecting content for AI
training
🔗
Interoperability
searchengineland.com
·
1d
1 day ago
Actions for Publishers push Common Crawl to stop collecting content for AI training
Microsoft
trained
its MAI models on unlicensed
web
data
despite promising "enterprise grade, clean and commercially licensed
data
"
📰
Content Curation
the-decoder.com
·
5d
5 days ago
Actions for Microsoft trained its MAI models on unlicensed web data despite promising "enterprise grade, clean and commercially licensed data"
The Integrity Graph: The Missing Layer In Your AI Visibility Audit via @sejournal, @billhunt
🎛️
Feed Filtering
searchenginejournal.com
·
14h
14 hours ago
Actions for The Integrity Graph: The Missing Layer In Your AI Visibility Audit via @sejournal, @billhunt
US publishers tell
Common
Crawl
to stop
scraping
and delete archive
🔗
Interoperability
pressgazette.co.uk
·
1d
1 day ago
·
Hacker News
Actions for US publishers tell Common Crawl to stop scraping and delete archive
mikinko/HuggingFace
_WFX: Total Commander WFX plugin for
HuggingFace
repos
🔀
JJ
Content type:
Code
github.com
·
4d
4 days ago
·
r/StableDiffusion
Actions for mikinko/HuggingFace_WFX: Total Commander WFX plugin for HuggingFace repos
Common
Crawl
Foundation at IIPC-WAC 2026
🏛️
Internet Archive
Content type:
Blog
commoncrawl.org
·
1d
1 day ago
Actions for Common Crawl Foundation at IIPC-WAC 2026
Pythia 1.4B reproduces 3.6% of
training
samples verbatim given 950-token prompts
⚡
Fast AI Inference
Content type:
Blog
ret2libc.com
·
4d
4 days ago
·
Hacker News
Actions for Pythia 1.4B reproduces 3.6% of training samples verbatim given 950-token prompts
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
🤖
AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
·
Hacker News
Actions for AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
Google’s DiffusionGemma is 4x faster than its other Gemma models
🤖
AI
thenewstack.io
·
10h
10 hours ago
Actions for Google’s DiffusionGemma is 4x faster than its other Gemma models
Testarvette: Ferrari Testarossa Replica
🚩
CTF Writeups
barnfinds.com
·
1d
1 day ago
Actions for Testarvette: Ferrari Testarossa Replica
NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...
⚡
Fast AI Inference
digg.com
·
6d
6 days ago
Actions for NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...
Enshittification Merch That Actually Fights Enshittification
🎨
Graphic Design
eff.org
·
10h
10 hours ago
Actions for Enshittification Merch That Actually Fights Enshittification
Three sleep intervals for three APIs: Steam 250ms, GitHub 100ms,
HuggingFace
none
🔀
JJ
Content type:
Reference
docs.github.com
·
4d
4 days ago
·
DEV
Actions for Three sleep intervals for three APIs: Steam 250ms, GitHub 100ms, HuggingFace none
Kinsta adds free bot protection to all WordPress plans
💳
Content Monetization
ppc.land
·
11h
11 hours ago
Actions for Kinsta adds free bot protection to all WordPress plans
My life as a human pincushion continues (Day 17, post-surgery)
🎆
Year End
creolened.com
·
2d
2 days ago
Actions for My life as a human pincushion continues (Day 17, post-surgery)
Tejas-TA/predikit: The missing bridge between your ML models and your AI agents.
🔧
Agent Tooling
Content type:
Code
github.com
·
8h
8 hours ago
·
Hacker News
Actions for Tejas-TA/predikit: The missing bridge between your ML models and your AI agents.
nex-agi/Nex-N2-mini •
Huggingface
🏗️
LLM Infrastructure
huggingface.co
·
6d
6 days ago
·
r/LocalLLaMA
Actions for nex-agi/Nex-N2-mini • Huggingface
A Few Good Things - Vol. 22
🍜
Umami
brandons-journal.com
·
3d
3 days ago
Actions for A Few Good Things - Vol. 22
SafeRun: Enabling Determinism in
LLM
Planning for Running
🏆
LLM Benchmarking
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for SafeRun: Enabling Determinism in LLM Planning for Running
Stack Overflow didn't just help AI learn to code
🤖
AI
zozo123.github.io
·
3d
3 days ago
·
Hacker News
Actions for Stack Overflow didn't just help AI learn to code
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help