arXiv:2601.11978v1 Announce Type: cross Abstract: Unauthorized screen-shooting poses a critical data leakage risk. Resisting screen-shooting attacks typically requires high-strength watermark embedding, inevitably degrading the cover image. To resolve the robustness-fidelity conflict, non-intrusive watermarking has emerged as a solution by constructing logical verification keys without altering the original content. However, existing non-intrusive schemes lack the capacity to withstand screen-shooting noise. While deep learning offers a potential remedy, we observe that directly applying it leads to a previously underexplored failure mode, the Structural Shortcut: networks tend to learn trivial identity mappings and neglect the image-watermark binding. Furthermore, even when logical bindin…
arXiv:2601.11978v1 Announce Type: cross Abstract: Unauthorized screen-shooting poses a critical data leakage risk. Resisting screen-shooting attacks typically requires high-strength watermark embedding, inevitably degrading the cover image. To resolve the robustness-fidelity conflict, non-intrusive watermarking has emerged as a solution by constructing logical verification keys without altering the original content. However, existing non-intrusive schemes lack the capacity to withstand screen-shooting noise. While deep learning offers a potential remedy, we observe that directly applying it leads to a previously underexplored failure mode, the Structural Shortcut: networks tend to learn trivial identity mappings and neglect the image-watermark binding. Furthermore, even when logical binding is enforced, standard training strategies cannot fully bridge the noise gap, yielding suboptimal robustness against physical distortions. In this paper, we propose NiMark, an end-to-end framework addressing these challenges. First, to eliminate the structural shortcut, we introduce the Sigmoid-Gated XOR (SG-XOR) estimator to enable gradient propagation for the logical operation, effectively enforcing rigid image-watermark binding. Second, to overcome the robustness bottleneck, we devise a two-stage training strategy integrating a restorer to bridge the domain gap caused by screen-shooting noise. Experiments demonstrate that NiMark consistently outperforms representative state-of-the-art methods against both digital attacks and screen-shooting noise, while maintaining zero visual distortion.