Title:Decoupled Entropy Minimization
Abstract:Entropy Minimization (EM) is beneficial to reducing class overlap, bridging domain gap, and restricting uncertainty for various tasks in machine learning, yet its potential is limited. To study the internal mechanism of EM, we reformulate and decouple the classical EM into two parts with opposite effects: cluster aggregation driving factor (CADF) rewards dominant classes and prompts a peaked output distribution, while gradient mitigation calibrator (GMC) penalizes high-confidence classes based on predicted probabilities. Furthermore, we reveal the limitations of classical EM caused by its coupled formulation: 1) reward colla…
Title:Decoupled Entropy Minimization
Abstract:Entropy Minimization (EM) is beneficial to reducing class overlap, bridging domain gap, and restricting uncertainty for various tasks in machine learning, yet its potential is limited. To study the internal mechanism of EM, we reformulate and decouple the classical EM into two parts with opposite effects: cluster aggregation driving factor (CADF) rewards dominant classes and prompts a peaked output distribution, while gradient mitigation calibrator (GMC) penalizes high-confidence classes based on predicted probabilities. Furthermore, we reveal the limitations of classical EM caused by its coupled formulation: 1) reward collapse impedes the contribution of high-certainty samples in the learning process, and 2) easy-class bias induces misalignment between output distribution and label distribution. To address these issues, we propose Adaptive Decoupled Entropy Minimization (AdaDEM), which normalizes the reward brought from CADF and employs a marginal entropy calibrator (MEC) to replace GMC. AdaDEM outperforms DEM*, an upper-bound variant of classical EM, and achieves superior performance across various imperfectly supervised learning tasks in noisy and dynamic environments.
| Comments: | To appear at NeurIPS 2025 (main conference), San Diego, CA, USA. Codes available at this https URL |
| Subjects: | Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML) |
| Cite as: | arXiv:2511.03256 [cs.LG] |
| (or arXiv:2511.03256v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2511.03256 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Xiang Xiang [view email] [v1] Wed, 5 Nov 2025 07:36:46 UTC (7,828 KB)