Title:NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Abstract:Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion {\phi}-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without…
Title:NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Abstract:Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion {\phi}-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. {\phi}-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, {\phi}-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, {\phi}-PD improves CARLA-to-Waymo planner performance by 50%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{this https URL}{project page}.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Robotics (cs.RO) |
| Cite as: | arXiv:2512.05106 [cs.CV] |
| (or arXiv:2512.05106v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2512.05106 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Yu Zeng [view email] [v1] Thu, 4 Dec 2025 18:59:18 UTC (12,818 KB)