Common Corpus, an open training set for AI, goes global – and so should support for it (opens in new tab)

<p>As many of the AI stories on Walled Culture attest, one of the most contentious areas in the latest stage of AI development concerns the sourcing of training data. To create high-quality large language models (LLMs) massive quantities of training data are required. In the current genAI stampede, many companies are simply scraping everything they …</p>