Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Web Datasets
🗄️ Web Datasets
Common Crawl, Corpus, Training data, Web scraping
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
138
posts in
51.2
ms
Publishers push
Common
Crawl
to stop collecting content for AI
training
🔗
Interoperability
searchengineland.com
·
1d
1 day ago
Actions for Publishers push Common Crawl to stop collecting content for AI training
US Publishers Demand
Common
Crawl
Stop
Scraping
Their Content via @sejournal, @MattGSouthern
🎆
Year End
searchenginejournal.com
·
23h
23 hours ago
Actions for US Publishers Demand Common Crawl Stop Scraping Their Content via @sejournal, @MattGSouthern
Microsoft
trained
its MAI models on unlicensed
web
data
despite promising "enterprise grade, clean and commercially licensed
data
"
📰
Content Curation
the-decoder.com
·
5d
5 days ago
Actions for Microsoft trained its MAI models on unlicensed web data despite promising "enterprise grade, clean and commercially licensed data"
Common
Crawl
Foundation at IIPC-WAC 2026
🏛️
Internet Archive
Content type:
Blog
commoncrawl.org
·
1d
1 day ago
Actions for Common Crawl Foundation at IIPC-WAC 2026
mikinko/HuggingFace
_WFX: Total Commander WFX plugin for
HuggingFace
repos
🔀
JJ
Content type:
Code
github.com
·
4d
4 days ago
·
r/StableDiffusion
Actions for mikinko/HuggingFace_WFX: Total Commander WFX plugin for HuggingFace repos
Testarvette: Ferrari Testarossa Replica
🚩
CTF Writeups
barnfinds.com
·
23h
23 hours ago
Actions for Testarvette: Ferrari Testarossa Replica
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
·
Hacker News
Actions for AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
Pythia 1.4B reproduces 3.6% of
training
samples verbatim given 950-token prompts
⚡
Fast AI Inference
Content type:
Blog
ret2libc.com
·
4d
4 days ago
·
Hacker News
Actions for Pythia 1.4B reproduces 3.6% of training samples verbatim given 950-token prompts
US publishers tell
Common
Crawl
to stop
scraping
and delete archive
🔗
Interoperability
pressgazette.co.uk
·
1d
1 day ago
·
Hacker News
Actions for US publishers tell Common Crawl to stop scraping and delete archive
Google’s DiffusionGemma is 4x faster than its other Gemma models
🤖
AI
thenewstack.io
·
7h
7 hours ago
Actions for Google’s DiffusionGemma is 4x faster than its other Gemma models
NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...
⚡
Fast AI Inference
digg.com
·
6d
6 days ago
Actions for NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...
know the products now; snap up deals later
🎯
Recommendation Metrics
techradar.com
·
20h
20 hours ago
Actions for know the products now; snap up deals later
Email ownership, I give up.
🧹
Spam Filters
Content type:
Discussion
lemmy.world
·
2d
2 days ago
Actions for Email ownership, I give up.
Three sleep intervals for three APIs: Steam 250ms, GitHub 100ms,
HuggingFace
none
🔀
JJ
Content type:
Reference
docs.github.com
·
4d
4 days ago
·
DEV
Actions for Three sleep intervals for three APIs: Steam 250ms, GitHub 100ms, HuggingFace none
Enshittification Merch That Actually Fights Enshittification
🎨
Graphic Design
eff.org
·
7h
7 hours ago
Actions for Enshittification Merch That Actually Fights Enshittification
My life as a human pincushion continues (Day 17, post-surgery)
🎆
Year End
creolened.com
·
1d
1 day ago
Actions for My life as a human pincushion continues (Day 17, post-surgery)
Tejas-TA/predikit: The missing bridge between your ML models and your AI agents.
🔧
Agent Tooling
Content type:
Code
github.com
·
5h
5 hours ago
·
Hacker News
Actions for Tejas-TA/predikit: The missing bridge between your ML models and your AI agents.
nex-agi/Nex-N2-mini •
Huggingface
🏗️
LLM Infrastructure
huggingface.co
·
6d
6 days ago
·
r/LocalLLaMA
Actions for nex-agi/Nex-N2-mini • Huggingface
A Few Good Things - Vol. 22
🍜
Umami
brandons-journal.com
·
3d
3 days ago
Actions for A Few Good Things - Vol. 22
SafeRun: Enabling Determinism in
LLM
Planning for Running
🏆
LLM Benchmarking
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SafeRun: Enabling Determinism in LLM Planning for Running
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help