Physics > Data Analysis, Statistics and Probability
arXiv:2511.15723 (physics)
Abstract:Quantifying numerical data involves addressing two key challenges: first, determining whether the data can be naturally quantified, and second, identifying the numerical intervals or ranges of values that correspond to specific value classes, referred to as “quantums,” which represent statistically meaningful states. If such quantification is feasible, continuous streams of numerical data can be transformed into sequences of “symbols” that reflect the states of the system described by the measured parameter. People often perform this task intuitively, relying on common sense or pra…
Physics > Data Analysis, Statistics and Probability
arXiv:2511.15723 (physics)
Abstract:Quantifying numerical data involves addressing two key challenges: first, determining whether the data can be naturally quantified, and second, identifying the numerical intervals or ranges of values that correspond to specific value classes, referred to as “quantums,” which represent statistically meaningful states. If such quantification is feasible, continuous streams of numerical data can be transformed into sequences of “symbols” that reflect the states of the system described by the measured parameter. People often perform this task intuitively, relying on common sense or practical experience, while information theory and computer science offer computable metrics for this purpose. In this study, we assess the applicability of metrics based on information compression and the Silhouette coefficient for quantifying numerical data. We also investigate the extent to which these metrics correlate with one another and with what is commonly referred to as “human intuition.” Our findings suggest that the ability to classify numeric data values into distinct categories is associated with a Silhouette coefficient above 0.65 and a Dip Test below 0.5; otherwise, the data can be treated as following a unimodal normal distribution. Furthermore, when quantification is possible, the Silhouette coefficient appears to align more closely with human intuition than the “normalized centroid distance” method derived from information compression perspective.
| Comments: | 9 pages, 5 figures, 1 table |
| Subjects: | Data Analysis, Statistics and Probability (physics.data-an); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Numerical Analysis (math.NA) |
| Cite as: | arXiv:2511.15723 [physics.data-an] |
| (or arXiv:2511.15723v1 [physics.data-an] for this version) | |
| https://doi.org/10.48550/arXiv.2511.15723 arXiv-issued DOI via DataCite |
Submission history
From: Anton Kolonin Dr. [view email] [v1] Sat, 15 Nov 2025 04:44:18 UTC (3,462 KB)