Skip to content

hao-ai-lab/FastVideo

Repository files navigation

FastVideo is a unified framework for accelerated video generation.

It features a clean, consistent API that works across popular video models, making it easier for developers to author new models and incorporate system- or kernel-level optimizations. With FastVideo's optimizations, you can achieve more than 3x inference improvement compared to other systems.

| Documentation | Quick Start | πŸ€— FastHunyuan | πŸ€— FastMochi | πŸŸ£πŸ’¬ Slack |

Key Features

FastVideo has the following features:

  • State-of-the-art performance optimizations for inference
  • Cutting edge models
    • Wan2.1 T2V, I2V
    • HunyuanVideo
    • FastHunyuan: consistency distilled video diffusion models for 8x inference speedup.
    • StepVideo T2V
  • Distillation support
    • Recipes for video DiT, based on PCM.
    • Support distilling/finetuning/inferencing state-of-the-art open video DiTs: 1. Mochi 2. Hunyuan.
  • Scalable training with FSDP, sequence parallelism, and selective activation checkpointing, with near linear scaling to 64 GPUs.
  • Memory efficient finetuning with LoRA, precomputed latent, and precomputed text embeddings.

Getting Started

We recommend using an environment manager such as Conda to create a clean environment:

# Create and activate a new conda environment
conda create -n fastvideo python=3.12
conda activate fastvideo

# Install FastVideo
pip install fastvideo

Please see our docs for more detailed installation instructions.

Inference

Generating Your First Video

Here's a minimal example to generate a video using the default settings. Create a file called example.py with the following code:

from fastvideo import VideoGenerator

def main():
    # Create a video generator with a pre-trained model
    generator = VideoGenerator.from_pretrained(
        "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
        num_gpus=1,  # Adjust based on your hardware
    )

    # Define a prompt for your video
    prompt = "A curious raccoon peers through a vibrant field of yellow sunflowers, its eyes wide with interest."

    # Generate the video
    video = generator.generate_video(
        prompt,
        return_frames=True,  # Also return frames from this call (defaults to False)
        output_path="my_videos/",  # Controls where videos are saved
        save_video=True
    )

if __name__ == '__main__':
    main()

Run the script with:

python example.py

For a more detailed guide, please see our inference quick start.

Other docs:

Distillation and Finetuning

πŸ“‘ Development Plan

  • More models support
    • Add StepVideo to V1
  • Optimization features
    • Teacache in V1
    • SageAttention in V1
  • Code updates
    • V1 Configuration API
    • Support Training in V1

🀝 Contributing

We welcome all contributions. Please check out our guide here

Acknowledgement

We learned and reused code from the following projects:

We thank MBZUAI and Anyscale for their support throughout this project.

Citation

If you use FastVideo for your research, please cite our paper:

@misc{zhang2025fastvideogenerationsliding,
      title={Fast Video Generation with Sliding Tile Attention},
      author={Peiyuan Zhang and Yongqi Chen and Runlong Su and Hangliang Ding and Ion Stoica and Zhenghong Liu and Hao Zhang},
      year={2025},
      eprint={2502.04507},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.04507},
}
@misc{ding2025efficientvditefficientvideodiffusion,
      title={Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile},
      author={Hangliang Ding and Dacheng Li and Runlong Su and Peiyuan Zhang and Zhijie Deng and Ion Stoica and Hao Zhang},
      year={2025},
      eprint={2502.06155},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.06155},
}