โก๏ธ LightX2V:
Light Video Generation Inference Framework
[ English | ไธญๆ ]
LightX2V is an advanced lightweight video generation inference framework engineered to deliver efficient, high-performance video synthesis solutions. This unified platform integrates multiple state-of-the-art video generation techniques, supporting diverse generation tasks including text-to-video (T2V) and image-to-video (I2V). X2V represents the transformation of different input modalities (X, such as text or images) into video output (V).
๐ Try it online now! Experience LightX2V without installation: LightX2V Online Service - Free, lightweight, and fast AI digital human videoโฆ
โก๏ธ LightX2V:
Light Video Generation Inference Framework
[ English | ไธญๆ ]
LightX2V is an advanced lightweight video generation inference framework engineered to deliver efficient, high-performance video synthesis solutions. This unified platform integrates multiple state-of-the-art video generation techniques, supporting diverse generation tasks including text-to-video (T2V) and image-to-video (I2V). X2V represents the transformation of different input modalities (X, such as text or images) into video output (V).
๐ Try it online now! Experience LightX2V without installation: LightX2V Online Service - Free, lightweight, and fast AI digital human video generation platform.
๐ฅ Latest News
December 15, 2025: ๐ Supported deployment on Hygon DCU.
December 4, 2025: ๐ Supported GGUF format model inference & deployment on Cambricon MLU590/MetaX C500.
November 24, 2025: ๐ We released 4-step distilled models for HunyuanVideo-1.5! These models enable ultra-fast 4-step inference without CFG requirements, achieving approximately 25x speedup compared to standard 50-step inference. Both base and FP8 quantized versions are now available: Hy1.5-Distill-Models.
November 21, 2025: ๐ We support the HunyuanVideo-1.5 video generation model since Day 0. With the same number of GPUs, LightX2V can achieve a speed improvement of over 2 times and supports deployment on GPUs with lower memory (such as the 24GB RTX 4090). It also supports CFG/Ulysses parallelism, efficient offloading, TeaCache/MagCache technologies, and more. We will soon update more models on our HuggingFace page, including step distillation, VAE distillation, and other related models. Quantized models and lightweight VAE models are now available: Hy1.5-Quantized-Models for quantized inference, and LightTAE for HunyuanVideo-1.5 for fast VAE decoding. Refer to this for usage tutorials, or check out the examples directory for code examples.
๐ Performance Benchmarks (Updated on 2025.12.01)
๐ Cross-Framework Performance Comparison (H100)
| Framework | GPUs | Step Time | Speedup |
|---|---|---|---|
| Diffusers | 1 | 9.77s/it | 1x |
| xDiT | 1 | 8.93s/it | 1.1x |
| FastVideo | 1 | 7.35s/it | 1.3x |
| SGL-Diffusion | 1 | 6.13s/it | 1.6x |
| LightX2V | 1 | 5.18s/it | 1.9x ๐ |
| FastVideo | 8 | 2.94s/it | 1x |
| xDiT | 8 | 2.70s/it | 1.1x |
| SGL-Diffusion | 8 | 1.19s/it | 2.5x |
| LightX2V | 8 | 0.75s/it | 3.9x ๐ |
๐ Cross-Framework Performance Comparison (RTX 4090D)
| Framework | GPUs | Step Time | Speedup |
|---|---|---|---|
| Diffusers | 1 | 30.50s/it | 1x |
| FastVideo | 1 | 22.66s/it | 1.3x |
| xDiT | 1 | OOM | OOM |
| SGL-Diffusion | 1 | OOM | OOM |
| LightX2V | 1 | 20.26s/it | 1.5x ๐ |
| FastVideo | 8 | 15.48s/it | 1x |
| xDiT | 8 | OOM | OOM |
| SGL-Diffusion | 8 | OOM | OOM |
| LightX2V | 8 | 4.75s/it | 3.3x ๐ |
๐ LightX2V Performance Comparison
| Framework | GPU | Configuration | Step Time | Speedup |
|---|---|---|---|---|
| LightX2V | H100 | 8 GPUs + cfg | 0.75s/it | 1x |
| LightX2V | H100 | 8 GPUs + no cfg | 0.39s/it | 1.9x |
| LightX2V | H100 | 8 GPUs + no cfg + fp8 | 0.35s/it | 2.1x ๐ |
| LightX2V | 4090D | 8 GPUs + cfg | 4.75s/it | 1x |
| LightX2V | 4090D | 8 GPUs + no cfg | 3.13s/it | 1.5x |
| LightX2V | 4090D | 8 GPUs + no cfg + fp8 | 2.35s/it | 2.0x ๐ |
Note: All the above performance data were tested on Wan2.1-I2V-14B-480P(40 steps, 81 frames). In addition, we also provide 4-step distilled models on the HuggingFace page.
๐ก Quick Start
For comprehensive usage instructions, please refer to our documentation: English Docs | ไธญๆๆๆกฃ
We highly recommend using the Docker environment, as it is the simplest and fastest way to set up the environment. For details, please refer to the Quick Start section in the documentation.
Installation from Git
pip install -v git+https://github.com/ModelTC/LightX2V.git
Building from Source
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v . # pip install -v .
(Optional) Install Attention/Quantize Operators
For attention operators installation, please refer to our documentation: English Docs | ไธญๆๆๆกฃ
Usage Example
# examples/wan/wan_i2v.py
"""
Wan2.2 image-to-video generation example.
This example demonstrates how to use LightX2V with Wan2.2 model for I2V generation.
"""
from lightx2v import LightX2VPipeline
# Initialize pipeline for Wan2.2 I2V task
# For wan2.1, use model_cls="wan2.1"
pipe = LightX2VPipeline(
model_path="/path/to/Wan2.2-I2V-A14B",
model_cls="wan2.2_moe",
task="i2v",
)
# Alternative: create generator from config JSON file
# pipe.create_generator(
# config_json="configs/wan22/wan_moe_i2v.json"
# )
# Enable offloading to significantly reduce VRAM usage with minimal speed impact
# Suitable for RTX 30/40/50 consumer GPUs
pipe.enable_offload(
cpu_offload=True,
offload_granularity="block", # For Wan models, supports both "block" and "phase"
text_encoder_offload=True,
image_encoder_offload=False,
vae_offload=False,
)
# Create generator manually with specified parameters
pipe.create_generator(
attn_mode="sage_attn2",
infer_steps=40,
height=480, # Can be set to 720 for higher resolution
width=832, # Can be set to 1280 for higher resolution
num_frames=81,
guidance_scale=[3.5, 3.5], # For wan2.1, guidance_scale is a scalar (e.g., 5.0)
sample_shift=5.0,
)
# Generation parameters
seed = 42
prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
negative_prompt = "้ๅคดๆๅจ๏ผ่ฒ่ฐ่ณไธฝ๏ผ่ฟๆ๏ผ้ๆ๏ผ็ป่ๆจก็ณไธๆธ
๏ผๅญๅน๏ผ้ฃๆ ผ๏ผไฝๅ๏ผ็ปไฝ๏ผ็ป้ข๏ผ้ๆญข๏ผๆดไฝๅ็ฐ๏ผๆๅทฎ่ดจ้๏ผไฝ่ดจ้๏ผJPEGๅ็ผฉๆฎ็๏ผไธ้็๏ผๆฎ็ผบ็๏ผๅคไฝ็ๆๆ๏ผ็ปๅพไธๅฅฝ็ๆ้จ๏ผ็ปๅพไธๅฅฝ็่ธ้จ๏ผ็ธๅฝข็๏ผๆฏๅฎน็๏ผๅฝขๆ็ธๅฝข็่ขไฝ๏ผๆๆ่ๅ๏ผ้ๆญขไธๅจ็็ป้ข๏ผๆไนฑ็่ๆฏ๏ผไธๆก่
ฟ๏ผ่ๆฏไบบๅพๅค๏ผๅ็่ตฐ"
image_path="/path/to/img_0.jpg"
save_result_path = "/path/to/save_results/output.mp4"
# Generate video
pipe.generate(
seed=seed,
image_path=image_path,
prompt=prompt,
negative_prompt=negative_prompt,
save_result_path=save_result_path,
)
๐ก More Examples: For more usage examples including quantization, offloading, caching, and other advanced configurations, please refer to the examples directory.
๐ค Supported Model Ecosystem
Official Open-Source Models
- โ HunyuanVideo-1.5
- โ Wan2.1 & Wan2.2
- โ Qwen-Image
- โ Qwen-Image-Edit
- โ Qwen-Image-Edit-2509
Quantized and Distilled Models/LoRAs (๐ Recommended: 4-step inference)
Lightweight Autoencoder Models (๐ Recommended: fast inference & low memory usage)
- โ Autoencoders
Autoregressive Models
๐ Follow our HuggingFace page for the latest model releases from our team.
๐ก Refer to the Model Structure Documentation to quickly get started with LightX2V
๐ Frontend Interfaces
We provide multiple frontend interface deployment options:
-
๐จ Gradio Interface: Clean and user-friendly web interface, perfect for quick experience and prototyping
-
๐ฏ ComfyUI Interface: Powerful node-based workflow interface, supporting complex video generation tasks
-
๐ Windows One-Click Deployment: Convenient deployment solution designed for Windows users, featuring automatic environment configuration and intelligent parameter optimization
๐ก Recommended Solutions:
- First-time Users: We recommend the Windows one-click deployment solution
- Advanced Users: We recommend the ComfyUI interface for more customization options
- Quick Experience: The Gradio interface provides the most intuitive operation experience
๐ Core Features
๐ฏ Ultimate Performance Optimization
- ๐ฅ SOTA Inference Speed: Achieve ~20x acceleration via step distillation and system optimization (single GPU)
- โก๏ธ Revolutionary 4-Step Distillation: Compress original 40-50 step inference to just 4 steps without CFG requirements
- ๐ ๏ธ Advanced Operator Support: Integrated with cutting-edge operators including Sage Attention, Flash Attention, Radial Attention, q8-kernel, sgl-kernel, vllm
๐พ Resource-Efficient Deployment
- ๐ก Breaking Hardware Barriers: Run 14B models for 480P/720P video generation with only 8GB VRAM + 16GB RAM
- ๐ง Intelligent Parameter Offloading: Advanced disk-CPU-GPU three-tier offloading architecture with phase/block-level granular management
- โ๏ธ Comprehensive Quantization: Support for
w8a8-int8,w8a8-fp8,w4a4-nvfp4and other quantization strategies
๐จ Rich Feature Ecosystem
- ๐ Smart Feature Caching: Intelligent caching mechanisms to eliminate redundant computations
- ๐ Parallel Inference: Multi-GPU parallel processing for enhanced performance
- ๐ฑ Flexible Deployment Options: Support for Gradio, service deployment, ComfyUI and other deployment methods
- ๐๏ธ Dynamic Resolution Inference: Adaptive resolution adjustment for optimal generation quality
- ๐๏ธ Video Frame Interpolation: RIFE-based frame interpolation for smooth frame rate enhancement
๐ Technical Documentation
๐ Method Tutorials
- Model Quantization - Comprehensive guide to quantization strategies
- Feature Caching - Intelligent caching mechanisms
- Attention Mechanisms - State-of-the-art attention operators
- Parameter Offloading - Three-tier storage architecture
- Parallel Inference - Multi-GPU acceleration strategies
- Changing Resolution Inference - U-shaped resolution strategy
- Step Distillation - 4-step inference technology
- Video Frame Interpolation - Base on the RIFE technology
๐ ๏ธ Deployment Guides
- Low-Resource Deployment - Optimized 8GB VRAM solutions
- Low-Latency Deployment - Ultra-fast inference optimization
- Gradio Deployment - Web interface setup
- Service Deployment - Production API service deployment
- Lora Model Deployment - Flexible Lora deployment
๐งพ Contributing Guidelines
We maintain code quality through automated pre-commit hooks to ensure consistent formatting across the project.
Tip
Setup Instructions:
- Install required dependencies:
pip install ruff pre-commit
- Run before committing:
pre-commit run --all-files
We appreciate your contributions to making LightX2V better!
๐ค Acknowledgments
We extend our gratitude to all the model repositories and research communities that inspired and contributed to the development of LightX2V. This framework builds upon the collective efforts of the open-source community.
๐ Star History
โ๏ธ Citation
If you find LightX2V useful in your research, please consider citing our work:
@misc{lightx2v,
author = {LightX2V Contributors},
title = {LightX2V: Light Video Generation Inference Framework},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ModelTC/lightx2v}},
}
๐ Contact & Support
For questions, suggestions, or support, please feel free to reach out through:
- ๐ GitHub Issues - Bug reports and feature requests
Built with โค๏ธ by the LightX2V team