Wave-Particle (Continuous-Discrete) Dualistic Visual Tokenization for Unified Understanding and Generation
arxiv.org·7h
Flag this post

View PDF HTML (experimental)

Abstract:The unification of understanding and generation within a single multi-modal large model (MLLM) remains one significant challenge, largely due to the dichotomy between continuous and discrete visual tokenizations. Continuous tokenizer (CT) achieves strong performance by bridging multiple independently-trained understanding modules and generation modules, but suffers from complex multi-stage pipelines and substantial engineering overhead. Conversely, discrete tokenizers (DT) offer a conceptually elegant idea by quantizing each image into a primitive, but inevitably leading to information loss and performance degradation. To resolve this tension, we question the binar…

Similar Posts

Loading similar posts...