Geometric Data Valuation via Leverage Scores
arxiv.org·4h
Flag this post

Title:Geometric Data Valuation via Leverage Scores

View PDF HTML (experimental)

Abstract:Shapley data valuation provides a principled, axiomatic framework for assigning importance to individual datapoints, and has gained traction in dataset curation, pruning, and pricing. However, it is a combinatorial measure that requires evaluating marginal utility across all subsets of the data, making it computationally infeasible at scale. We propose a geometric alternative based on statistical leverage scores, which quantify each datapoint’s structural influence in the representation space by measuring how much it extends the span of the dataset and contributes to the effective dimensionality of the training proble…

Similar Posts

Loading similar posts...