CoVEBench: Can Video Editing Models Handle Complex Instructions? (opens in new tab) 🎬Videography Content type: Academic

arxiv.org··Covered by ai-brief.liziran.com·Open original

While recent text-guided video editing models excel at elementary tasks (e.g., style transfer, object insertion), real-world user requests are highly compositional. A single prompt often demands multiple coupled edits, such as modifying subjects, actions, and camera views, while strictly preserving unrelated spatiotemporal content. Existing benchmarks, heavily constrained by isolated edits and coarse global metrics, fail to diagnose how models h...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Cited by 1 article

In other languages

视频模型栽在组合编辑，MoE败在路由

ai-brief.liziran.com·