Sell Data to AI Algorithms Without Revealing It: Secure Data Valuation and Sharing via Homomorphic Encryption

View PDF HTML (experimental)

Abstract:The rapid expansion of Artificial Intelligence is hindered by a fundamental friction in data markets: the value-privacy dilemma, where buyers cannot verify a dataset’s utility without inspection, yet inspection may expose the data (Arrow’s Information Paradox). We resolve this challenge by introducing the Trustworthy Influence Protocol (TIP), a privacy-preserving framework that enables prospective buyers to quantify the utility of external data without ever decrypting the raw assets. By integrating Homomorphic Encryption with gradient-based influence functions, our approach allows for the precise, blinded scoring of data points against a buyer’s specific AI model. To…

View PDF HTML (experimental)

Abstract:The rapid expansion of Artificial Intelligence is hindered by a fundamental friction in data markets: the value-privacy dilemma, where buyers cannot verify a dataset’s utility without inspection, yet inspection may expose the data (Arrow’s Information Paradox). We resolve this challenge by introducing the Trustworthy Influence Protocol (TIP), a privacy-preserving framework that enables prospective buyers to quantify the utility of external data without ever decrypting the raw assets. By integrating Homomorphic Encryption with gradient-based influence functions, our approach allows for the precise, blinded scoring of data points against a buyer’s specific AI model. To ensure scalability for Large Language Models (LLMs), we employ low-rank gradient projections that reduce computational overhead while maintaining near-perfect fidelity to plaintext baselines, as demonstrated across BERT and GPT-2 architectures. Empirical simulations in healthcare and generative AI domains validate the framework’s economic potential: we show that encrypted valuation signals achieve a high correlation with realized clinical utility and reveal a heavy-tailed distribution of data value in pre-training corpora where a minority of texts drive capability while the majority degrades it. These findings challenge prevailing flat-rate compensation models and offer a scalable technical foundation for a meritocratic, secure data economy.


Subjects:	Cryptography and Security (cs.CR); General Economics (econ.GN)
Cite as:	arXiv:2512.06033 [cs.CR]
	(or arXiv:2512.06033v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2512.06033 arXiv-issued DOI via DataCite

Submission history

From: Michael Yang [view email] [v1] Thu, 4 Dec 2025 16:35:09 UTC (197 KB)

Submission history

Similar Posts