A Profile of Common Crawl
pxlnv.com·3h
Flag this post

A Profile of Common Crawl * theatlantic.com *

Alex Reisner, the Atlantic:

The Common Crawl Foundation is little known outside of Silicon Valley. For more than a decade, the nonprofit has been scraping billions of webpages to build a massive archive of the internet. This database — large enough to be measured in petabytes — is made freely available for research. In recent years, however, this archive has been put to a controversial purpose: AI companies including OpenAI, Google, Anthropic, Nvidia, Meta, …

Similar Posts

Loading similar posts...