Web Datasets

Feeds to Scour
SubscribedAll
Scoured 72 posts in 33.1 ms

Publishers push Common Crawl to stop collecting content for AI training

 🔗Interoperability
searchengineland.com·

US publishers tell Common Crawl to stop scraping and delete archive

 🔗Interoperability

Pythia 1.4B reproduces 3.6% of training samples verbatim given 950-token prompts

 Fast AI Inference  Content type: Blog
ret2libc.com··Hacker News

Common Crawl Foundation at IIPC-WAC 2026

 🏛️Internet Archive  Content type: Blog
commoncrawl.org·

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

 🤖AI  Content type: Academic
arxiv.org··Hacker News

NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...

 Fast AI Inference
digg.com·

Tejas-TA/predikit: The missing bridge between your ML models and your AI agents.

 🔧Agent Tooling  Content type: Code
github.com··Hacker News

LangChain Series #2: Models Explained — LLMs, Chat Models, and Embeddings with Practical…

 📊Embeddings
pub.towardsai.net
·

nex-agi/Nex-N2-mini • Huggingface

 🏗️LLM Infrastructure

Google’s DiffusionGemma is 4x faster than its other Gemma models

 🤖AI
thenewstack.io·

My life as a human pincushion continues (Day 17, post-surgery)

 🎆Year End
creolened.com·

Stack Overflow didn't just help AI learn to code

 🤖AI

OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.

 🔗Interoperability  Content type: Blog
Less-relevant results

Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations

 🆕New AI  Content type: Blog
andlukyane.com··Hacker News

Enshittification Merch That Actually Fights Enshittification

 🎨Graphic Design
eff.org·

How I stay connected (Bear Blog Carnival)

 🧘Digital Minimalism  Content type: Blog
hung.bearblog.dev·

Purpose-built local AI agents

 🤖AI  Content type: Blog

Notice from SASAC and MIIT on jointly launching the 2026 Special Action Plan for Real-Scene Training of Humanoid Robots and Embodied Intelligence

 🇨🇳China Tech Policy
threadreaderapp.com·

Job Searcher

 🤖AI  Content type: Blog
huggingface.co·

SafeRun: Enabling Determinism in LLM Planning for Running

 🏆LLM Benchmarking  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help