Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Web Datasets
🗄️ Web Datasets
Common Crawl, Corpus, Training data, Web scraping
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
72
posts in
33.1
ms
Publishers push
Common
Crawl
to stop collecting content for AI
training
🔗
Interoperability
searchengineland.com
·
23h
23 hours ago
Actions for Publishers push Common Crawl to stop collecting content for AI training
US publishers tell
Common
Crawl
to stop
scraping
and delete archive
🔗
Interoperability
pressgazette.co.uk
·
1d
1 day ago
·
Hacker News
Actions for US publishers tell Common Crawl to stop scraping and delete archive
Pythia 1.4B reproduces 3.6% of
training
samples verbatim given 950-token prompts
⚡
Fast AI Inference
Content type:
Blog
ret2libc.com
·
3d
3 days ago
·
Hacker News
Actions for Pythia 1.4B reproduces 3.6% of training samples verbatim given 950-token prompts
Common
Crawl
Foundation at IIPC-WAC 2026
🏛️
Internet Archive
Content type:
Blog
commoncrawl.org
·
23h
23 hours ago
Actions for Common Crawl Foundation at IIPC-WAC 2026
AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
·
Hacker News
Actions for AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...
⚡
Fast AI Inference
digg.com
·
6d
6 days ago
Actions for NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...
Tejas-TA/predikit: The missing bridge between your ML models and your AI agents.
🔧
Agent Tooling
Content type:
Code
github.com
·
4h
4 hours ago
·
Hacker News
Actions for Tejas-TA/predikit: The missing bridge between your ML models and your AI agents.
LangChain Series #2: Models Explained — LLMs, Chat Models, and Embeddings with Practical…
📊
Embeddings
pub.towardsai.net
·
1d
1 day ago
Actions for LangChain Series #2: Models Explained — LLMs, Chat Models, and Embeddings with Practical…
nex-agi/Nex-N2-mini •
Huggingface
🏗️
LLM Infrastructure
huggingface.co
·
6d
6 days ago
·
r/LocalLLaMA
Actions for nex-agi/Nex-N2-mini • Huggingface
Google’s DiffusionGemma is 4x faster than its other Gemma models
🤖
AI
thenewstack.io
·
5h
5 hours ago
Actions for Google’s DiffusionGemma is 4x faster than its other Gemma models
My life as a human pincushion continues (Day 17, post-surgery)
🎆
Year End
creolened.com
·
1d
1 day ago
Actions for My life as a human pincushion continues (Day 17, post-surgery)
Stack Overflow didn't just help AI learn to code
🤖
AI
zozo123.github.io
·
3d
3 days ago
·
Hacker News
Actions for Stack Overflow didn't just help AI learn to code
OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for
training
agents.
🔗
Interoperability
Content type:
Blog
huggingface.co
·
2d
2 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.
Less-relevant results
Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations
🆕
New AI
Content type:
Blog
andlukyane.com
·
23h
23 hours ago
·
Hacker News
Actions for Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations
Enshittification Merch That Actually Fights Enshittification
🎨
Graphic Design
eff.org
·
5h
5 hours ago
Actions for Enshittification Merch That Actually Fights Enshittification
How I stay connected (Bear Blog Carnival)
🧘
Digital Minimalism
Content type:
Blog
hung.bearblog.dev
·
3d
3 days ago
Actions for How I stay connected (Bear Blog Carnival)
Purpose-built local AI agents
🤖
AI
Content type:
Blog
samihonkonen.com
·
2d
2 days ago
·
Hacker News
Actions for Purpose-built local AI agents
Notice from SASAC and MIIT on jointly launching the 2026 Special Action Plan for Real-Scene
Training
of Humanoid Robots and Embodied Intelligence
🇨🇳
China Tech Policy
threadreaderapp.com
·
2h
2 hours ago
Actions for Notice from SASAC and MIIT on jointly launching the 2026 Special Action Plan for Real-Scene Training of Humanoid Robots and Embodied Intelligence
Job Searcher
🤖
AI
Content type:
Blog
huggingface.co
·
4d
4 days ago
Actions for Job Searcher
SafeRun: Enabling Determinism in
LLM
Planning for Running
🏆
LLM Benchmarking
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SafeRun: Enabling Determinism in LLM Planning for Running
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help