Disco-LoRA: Disentangled Composition of Content, Style, and Motion for Multi-concept Video Customization (opens in new tab)

Video customization based on Text-to-Video (T2V) models aims to learn specific features from reference data to generate controllable videos. While significant strides have been made in image stylization and video motion customization, simultaneously controlling multiple concepts, such as content, style, and motion, remains a major challenge. In this work, we systematically define the task of multi-concept video customization, which requires th...

Read the original article