The Statistical Signature of LLMs (opens in new tab)
arXiv:2602.18152v1 Announce Type: new Abstract: Large language models generate text through probabilistic sampling from high-dimensional distributions, yet how this process reshapes the structural statistical organization of language remains incompletely characterized. Here we show that lossless compression provides a simple, model-agnostic measure of statistical regularity that differentiates generative regimes directly from surface text. We analyze compression behavior across three progres...
Read the original article