Abstract:Class Activation Mapping (CAM) and its extensions have become indispensable tools for visualizing the evidence behind deep network predictions. However, by relying on a final softmax classifier, these methods suffer from two fundamental distortions: additive logit shifts that arbitrarily bias importance scores, and sign collapse that conflates excitatory and inhibitory features. We propose a simple, architecture-agnostic dual-branch sigmoid head that decouples localization from classification. Given any pretrained model, we clone its classification head into a parallel branch ending in per-class sigmoid outputs, freeze the original softmax head, and fine-tune only the sigmoid branch with class-balanced binary supervision. At in…
Abstract:Class Activation Mapping (CAM) and its extensions have become indispensable tools for visualizing the evidence behind deep network predictions. However, by relying on a final softmax classifier, these methods suffer from two fundamental distortions: additive logit shifts that arbitrarily bias importance scores, and sign collapse that conflates excitatory and inhibitory features. We propose a simple, architecture-agnostic dual-branch sigmoid head that decouples localization from classification. Given any pretrained model, we clone its classification head into a parallel branch ending in per-class sigmoid outputs, freeze the original softmax head, and fine-tune only the sigmoid branch with class-balanced binary supervision. At inference, softmax retains recognition accuracy, while class evidence maps are generated from the sigmoid branch – preserving both magnitude and sign of feature contributions. Our method integrates seamlessly with most CAM variants and incurs negligible overhead. Extensive evaluations on fine-grained tasks (CUB-200-2011, Stanford Cars) and WSOL benchmarks (ImageNet-1K, OpenImages30K) show improved explanation fidelity and consistent Top-1 Localization gains – without any drop in classification accuracy. Code is available at this https URL.
| Comments: | Accepted at BMVC 2025 |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) |
| Cite as: | arXiv:2511.05590 [cs.CV] |
| (or arXiv:2511.05590v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2511.05590 arXiv-issued DOI via DataCite |
Submission history
From: Yoojin Oh [view email] [v1] Wed, 5 Nov 2025 12:28:14 UTC (2,083 KB)