ReasonX: MLLM-Guided Intrinsic Image Decomposition
arxiv.org·2d
💻Local LLMs
Preview
Report Post

Title:ReasonX: MLLM-Guided Intrinsic Image Decomposition

View PDF HTML (experimental)

Abstract:Intrinsic image decomposition aims to separate images into physical components such as albedo, depth, normals, and illumination. While recent diffusion- and transformer-based models benefit from paired supervision from synthetic datasets, their generalization to diverse, real-world scenarios remains challenging. We propose ReasonX, a novel framework that leverages a multimodal large language model (MLLM) as a perceptual judge providing relative intrinsic comparisons, and uses these comparisons as GRPO rewards for fine-tuning intrinsic decomposition models on unlabeled, in-the-wild images. Unlike RL methods fo…

Similar Posts

Loading similar posts...