Skip to content

xlite-dev/draft-attention

 
 

Repository files navigation

Draft Attention

This repository provides an overview of all resources for the paper "DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance".

Draft Attention is a plug-and-play acceleration method for video diffusion transformers.

Draft Attention reshapes long queries and keys into frame-wise feature maps and applying 2D average pooling to downsample them.

Draft Attention provides the reference for the sparse attention in full length.

Draft Attention introduces minimal overhead by compressing the number of tokens 128x or larger.

🔥 News

  • [2025/05] We support HunyuanCustom with classifier free guidance.

🎥 Demo

Hunyuan


Dense Attention

Sparse Video Generation (SVG)

Draft Attention (Ours)

Prompt: "The banks of the Thames, as the camera moves vertically from low to high."


Dense Attention

Sparse Video Generation (SVG)

Draft Attention (Ours)

Prompt: "On the green grass, the white-walled Leaning Tower of Pisa stands tall. The camera moves vertically from top to bottom during filming."


Dense Attention

Sparse Video Generation (SVG)

Draft Attention (Ours)

Prompt: "A blue long dress fell from the balcony clothes rack and dropped into the water on the ground."

Prompts are all from the Penguin Video Benchmark.

Videos are generated with sparsity 90%, seed 42, using Hunyuan model in 768p on A100 GPU.

HunyuanCustom


Input Image

Dense Attention

Draft Attention (Ours)

Prompt: "Realistic, High-quality. A woman is drinking coffee at a café."

Videos are generated with seed 42 in 768p resolution on 8xA100 GPUs, with either dense attention or 90% sparse attention.

🚀 Quick Start

Model Preparation

Please follow the instruction of environment setup and download the checkpoint from HunyuanVideo, Wan2.1, and HunyuanCustom.

Sparse Attention

We mainly adopt the block sparse attention for draft attention.

Video Generation

Simply run video generation with scripts in hunyuan/, wan/ or hunyuan_custom/.

Evaluation results in the paper are mainly achieved with VBench on Penguin Video Benchmark using HunyuanVideo and Wan2.1.

Use for Your Own

You can simply use the draft attention similar as the flash attention through the Draft_Attention defined in draft_attention.py or draft_attention_classifier_free_guidance.py.

Here is the example for hunyuan model:

from draft_attention import Draft_Attention

draft_attention = Draft_Attention(
    pool_h=8,
    pool_w=16,
    latent_h=48,
    latent_w=80,
    visual_len=126_720,
    text_len=256,
    sparsity_ratio=0.9,
)

x = draft_attention(
    q,
    k,
    v,
    attn_mask=attn_mask,
    causal=causal,
    drop_rate=drop_rate,
    cu_seqlens_q=cu_seqlens_q,
    cu_seqlens_kv=cu_seqlens_kv,
    max_seqlen_q=max_seqlen_q,
    max_seqlen_kv=max_seqlen_kv,
    batch_size=batch_size,
)

✏️ TODO

  • Support any-resolution video generation with padding.
  • Support reordering of further block sparse grouping for faster hardware execution.

📑 Acknowledgement

This work is mainly contributed by Xuan and Chenxia.

🔗 BibTeX

If you find Draft Attention is interesting, please cite through BibTeX:

@article{shen2025draft,
  title={DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance},
  author={Shen, Xuan and Han, Chenxia and Zhou, Yufa and Xie, Yanyue and Gong, Yifan and Wang, Quanyi and Wang, Yiwei and Wang, Yanzhi and Zhao, Pu and Gu, Jiuxiang},
  journal={arXiv preprint arXiv:2505.14708},
  year={2025}
}

About

Code for Draft Attention

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Shell 0.3%