How to prove your AI wasn't trained on private data (opens in new tab)

Discussed on DEV

The NYT sued OpenAI. Getty sued Stability AI. Every AI company with a copyright problem is now asking the same question: can we prove what was and wasn't in our training set? Until now, the honest answer was no. Training runs are one-way operations — you can say "we used Common Crawl" but you can't issue a cryptographic proof that a specific document was excluded. CompletenessManifest is a Python library that changes that. It's part of Cathedral-Constraint-Field, and it lets you build trainin...

Read the original article