Web Datasets

Feeds to Scour
SubscribedAll
Scoured 137 posts in 29.4 ms

Publishers push Common Crawl to stop collecting content for AI training

 🔗Interoperability

Microsoft trained its MAI models on unlicensed web data despite promising "enterprise grade, clean and commercially licensed data"

 📰Content Curation
the-decoder.com
·

The Integrity Graph: The Missing Layer In Your AI Visibility Audit via @sejournal, @billhunt

 🎛️Feed Filtering
searchenginejournal.com·

US publishers tell Common Crawl to stop scraping and delete archive

 🔗Interoperability

mikinko/HuggingFace_WFX: Total Commander WFX plugin for HuggingFace repos

 🔀JJ  Content type: Code

Common Crawl Foundation at IIPC-WAC 2026

 🏛️Internet Archive  Content type: Blog
commoncrawl.org·

Pythia 1.4B reproduces 3.6% of training samples verbatim given 950-token prompts

 Fast AI Inference  Content type: Blog
ret2libc.com··Hacker News

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

 🤖AI  Content type: Academic
arxiv.org··Hacker News

Google’s DiffusionGemma is 4x faster than its other Gemma models

 🤖AI
thenewstack.io·

Testarvette: Ferrari Testarossa Replica

 🚩CTF Writeups
barnfinds.com·

NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...

 Fast AI Inference
digg.com·

Enshittification Merch That Actually Fights Enshittification

 🎨Graphic Design
eff.org·

Three sleep intervals for three APIs: Steam 250ms, GitHub 100ms, HuggingFace none

 🔀JJ  Content type: Reference
docs.github.com··DEV

Kinsta adds free bot protection to all WordPress plans

 💳Content Monetization
ppc.land·

My life as a human pincushion continues (Day 17, post-surgery)

 🎆Year End
creolened.com·

Tejas-TA/predikit: The missing bridge between your ML models and your AI agents.

 🔧Agent Tooling  Content type: Code
github.com··Hacker News

nex-agi/Nex-N2-mini • Huggingface

 🏗️LLM Infrastructure

A Few Good Things - Vol. 22

 🍜Umami
brandons-journal.com·

SafeRun: Enabling Determinism in LLM Planning for Running

 🏆LLM Benchmarking  Content type: Academic
arxiv.org·

Stack Overflow didn't just help AI learn to code

 🤖AI

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help