FluencyVE: Marrying Temporal-Aware Mamba with Bypass Attention for Video Editing
arxiv.org·2d
🗜️LZW Variants
Preview
Report Post

View PDF HTML (experimental)

Abstract:Large-scale text-to-image diffusion models have achieved unprecedented success in image generation and editing. However, extending this success to video editing remains challenging. Recent video editing efforts have adapted pretrained text-to-image models by adding temporal attention mechanisms to handle video tasks. Unfortunately, these methods continue to suffer from temporal inconsistency issues and high computational overheads. In this study, we propose FluencyVE, which is a simple yet effective one-shot video editing approach. FluencyVE integrates the linear time-series module, Mamba, into a video editing model based on pretrained Stable Diffusion models, replacin…

Similar Posts

Loading similar posts...