Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🗄️ Web Datasets
Common Crawl, Corpus, Training data, Web scraping
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
66
posts in
29.2
ms
Show HN:
Dataset
for AI
training
and fine tuning
🏗️
LLM Infrastructure
neurvance.com
·
2d
·
Hacker News
A Wikipedia Clone Built on AI Hallucinations Is Here to Hasten Along the Death of the Internet
🛡️
Content Moderation
gizmodo.com
·
6d
Position: Let's Develop
Data
Probes to Fundamentally Understand How
Data
Affects LLM Performance
🏆
LLM Benchmarking
arxiv.org
·
23h
Lecturing
Common
Crawl
: Publishers Tell Nonprofit To Stop Unauthorized
Scraping
🤖
Web Crawling Politeness
mediapost.com
·
1d
·
Hacker News
AI writing hits a ceiling
🎭
Claude
axios.com
·
5d
·
Hacker News
Blog - April 2026
Crawl
Archive Now Available in a Hugging Face Storage Bucket
📡
RSS
commoncrawl.org
·
1d
LLM Ranking Factors
🎯
BM25
oppalerts.com
·
2d
Guests on our own
web
🌐
ARPANET History
jolek78.writeas.com
·
4d
Disney erased FiveThirtyEight
🏛️
Politics
natesilver.net
·
1d
·
Hacker News
GoogleChrome/modern-web-guidance
📐
Progressive Enhancement
github.com
·
1h
Coding agents for
data
analysis
💻
Coding Agents
simonw.github.io
·
6d
How to
Clean
Time Series
Data
in Python
💧
Drop Check
freecodecamp.org
·
2d
·
Hacker News
Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters
✨
Gemini
arxiv.org
·
23h
New AI system classifies India rainfall better, cutting false alarms and missed heavy rain
🆕
New AI
phys.org
·
6d
Fire Detection Without
Training
a Model? Edge RAG Does It Better
🔥
Burn
pub.towardsai.net
·
1d
If I Were Emperor of New AI Safety Researcher
Training
...
🤔
Philosophy of Tech
lesswrong.com
·
3h
sparshrestha/NewsQA-LSTM: LSTM-based Question Answering system on News Articles. Includes pipeline for
data
ingestion, BiLSTM retriever, and LSTM+attention reader with citations (URL, Headline, Date).
Part
of a coursework at Kathmandu University.
🔤
Tokenization
github.com
·
1d
·
Hacker News
Show HN: I built a
Web-Scraper
API that is 6-7x more efficient than current ones
💰
Web Monetization API
scrapewithruno.com
·
6d
·
Hacker News
Infini-News: Efficiently Queryable Access to 1.3 Billion Processed
Common
Crawl
News Articles
🔍
Information Retrieval
arxiv.org
·
1d
CEO Interview with Adi Gelvan of Speedata
⚡
ClickHouse
semiwiki.com
·
3d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help