🎬Video/3D/4D generation arxiv.orgAcademic

CineOrchestra: Unified Entity-Centric Conditioning for Cinematic Video Generation (opens in new tab)

Cinematic video depicts multiple subjects acting or interacting at specific moments, captured with deliberate camera movement, and stitched together by shot transitions. Together, these elements demand a level of fine-grained control beyond current text-to-video models. Existing work addresses each axis in isolation: multi-subject personalization, temporal control, multi-shot synthesis, or camera control; no prior framework jointly integrates al...

Read the original article