Expanding the Language and Cultural Coverage of Common Crawl (opens in new tab)
We aim to enhance linguistic diversity in our dataset by inviting community contributions of non-English URLs and collaborating with MLCommons on a Language Identification campaign.
Read the original article