Add SkyReels V2: Infinite-Length Film Generative Model #11518

tolgacangoz · 2025-05-07T18:58:53Z

Thanks for the opportunity to fix #11374!

Original Work

Original repo: https://github.com/SkyworkAI/SkyReels-V2
Paper: https://huggingface.co/papers/2504.13074

SkyReels V2's main contributions are summarized as follow:
• Comprehensive video captioner that understand the shot language while capturing the general description of the video, which dramatically improve the prompt adherence.
• Motion-specific preference optimization enhances motion dynamics with a semi-automatic data collection pipeline.
• Effective Diffusion-forcing adaptation enables the generation of ultra-long videos and story generation capabilities, providing a robust framework for extending temporal coherence and narrative depth.
• SkyCaptioner-V1 and SkyReels-V2 series models including diffusion-forcing, text2video, image2video, camera director and elements2video models with various sizes (1.3B, 5B, 14B) are open-sourced.

TODOs:
⏳ FlowMatchUniPCMultistepScheduler: just copy-pasted from the original repo
✅ SkyReelsV2Transformer3DModel: 90% WanTransformer3DModel
✅ SkyReelsV2DiffusionForcingPipeline

tolgacangoz/SkyReels-V2-DF-1.3B-540P-Diffusers is ready to be forked.
tolgacangoz/SkyReels-V2-DF-14B-720P-Diffusers is ready to be forked.
tolgacangoz/SkyReels-V2-DF-14B-540P-Diffusers is ready to be forked.

✅ SkyReelsV2DiffusionForcingImageToVideoPipeline: Includes FLF2V.
✅ SkyReelsV2DiffusionForcingVideoToVideoPipeline: Extends a given video.
⬜ SkyReelsV2Pipeline
⬜ SkyReelsV2ImageToVideoPipeline
⏳ scripts/convert_skyreelsv2_to_diffusers.py
⬜ Did you make sure to update the documentation with your changes?
⬜ Did you write any new necessary tests?

T2V with Diffusion Forcing

Skywork/SkyReels-V2-DF-1.3B-540P
seed 0 and num_frames 97
Original repo	`diffusers` integration
original_0_short.mp4	diffusers_0_short.mp4

seed 37 and num_frames 97
Original repo	`diffusers` integration
original_37_short.mp4	diffusers_37_short.mp4

seed 0 and num_frames 257
Original repo	`diffusers` integration
original_0_long.mp4	diffusers_0_long.mp4

seed 37 and num_frames 257
Original repo	`diffusers` integration
original_37_long.mp4	diffusers_37_long.mp4

!pip install git+https://github.com/tolgacangoz/diffusers.git@skyreels-v2 ftfy -q
import torch
from diffusers import AutoencoderKLWan, SkyReelsV2DiffusionForcingPipeline
from diffusers.utils import export_to_video

vae = AutoencoderKLWan.from_pretrained(
			"tolgacangoz/SkyReels-V2-DF-1.3B-540P-Diffusers",
			subfolder="vae",
			torch_dtype=torch.float32)
pipe = SkyReelsV2DiffusionForcingPipeline.from_pretrained(
			"tolgacangoz/SkyReels-V2-DF-1.3B-540P-Diffusers",
			vae=vae,
			torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")
pipe.transformer.set_ar_attention(causal_block_size=5)

prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window."

output = pipe(
    prompt=prompt,
    num_inference_steps=30,
    height=544,
    width=960,
    num_frames=97,
    ar_step=5,  # Controls asynchronous inference (0 for synchronous mode)
    generator=torch.Generator(device="cuda").manual_seed(0),
    overlap_history=None,  # Number of frames to overlap for smooth transitions in long videos; 17 for long
    addnoise_condition=20,  # Improves consistency in long video generation
).frames[0]
export_to_video(output, "video.mp4", fps=24, quality=8)

"""
You can set `ar_step=5` to enable asynchronous inference. When asynchronous inference,
`causal_block_size=5` is recommended while it is not supposed to be set for
synchronous generation. Asynchronous inference will take more steps to diffuse the
whole sequence which means it will be SLOWER than synchronous mode. In our
experiments, asynchronous inference may improve the instruction following and visual consistent performance.
"""

…usion forcing - Introduced the drafts of `SkyReelsV2TextToVideoPipeline`, `SkyReelsV2ImageToVideoPipeline`, `SkyReelsV2DiffusionForcingPipeline`, and `FlowUniPCMultistepScheduler`.

ukaprch · 2025-05-08T15:47:38Z

It's about time. Thanks.

…tion mechanisms

Replaces custom attention implementations with `SkyReelsV2AttnProcessor2_0` and the standard `Attention` module. Updates `WanAttentionBlock` to use `FP32LayerNorm` and `FeedForward`. Removes the `model_type` parameter, simplifying model architecture and attention block initialization.

Introduces new classes `SkyReelsV2ImageEmbedding` and `SkyReelsV2TimeTextImageEmbedding` for enhanced image and time-text processing. Refactors the `SkyReelsV2Transformer3DModel` to integrate these embeddings, updating the constructor parameters for better clarity and functionality. Removes unused classes and methods to streamline the codebase.

…ds and begin reorganizing the forward pass.

…ethod

…hod, integrating rotary embeddings and improving attention handling. Removes the deprecated `rope_apply` function and streamlines the attention mechanism for better integration and clarity.

…ethod by updating parameter names for clarity, integrating attention masks, and improving the handling of encoder hidden states.

…ethod by enhancing the handling of time embeddings and encoder hidden states. Updates parameter names for clarity and integrates rotary embeddings, ensuring better compatibility with the model's architecture.

…ngImageToVideoPipeline` documentation.

…ForcingVideoToVideoPipeline`, enhancing support for Video-to-Video (v2v) generation. Introduce video input handling, update latent preparation logic, and improve error handling for input parameters.

… the `image_encoder` and `image_processor` dependencies. Update the CPU offload sequence accordingly.

…latent preparation logic and condition handling. Update image input type to `Optional`, streamline video condition processing, and improve handling of `last_image` during latent generation.

…ration for long video generation. Introduce new parameters for video handling, overlap history, and causal block size. Update logic to accommodate both short and long video scenarios, ensuring compatibility and improved processing.

…ideoToVideoPipeline` to ensure proper noise scaling during latent generation.

…pport for `last_image` parameter and refining latent frame calculations. Update preprocessing logic.

…eoPipeline` by correcting variable names and reintroducing latent mean and standard deviation calculations. Update logic for frame preparation and sampling to ensure accurate video generation.

…latent handling by enforcing tensor input for video, updating frame preparation logic, and adjusting default frame count. Enhance preprocessing and postprocessing steps for better integration.

…ForcingImageToVideoPipeline` to ensure correct dimensionality for video conditions and latent conditions.

…VideoPipeline` to handle tensor dimensions more robustly, ensuring compatibility with both 3D and 4D video inputs.

…teration print statements for better debugging. Clean up unused code related to prefix video latents length calculation in `SkyReelsV2DiffusionForcingImageToVideoPipeline`.

tolgacangoz and others added 7 commits May 7, 2025 21:53

Add SkyReels-V2 pipelines for text-to-video, image-to-video, and diff…

4dd739f

…usion forcing - Introduced the drafts of `SkyReelsV2TextToVideoPipeline`, `SkyReelsV2ImageToVideoPipeline`, `SkyReelsV2DiffusionForcingPipeline`, and `FlowUniPCMultistepScheduler`.

Merge branch 'main' into skyreels-v2

899f41c

up

607b5ba

second draft

3ccf201

Merge branch 'main' into skyreels-v2

959ca1f

up

37ca14f

Merge branch 'main' into skyreels-v2

d80b505

tolgacangoz added 22 commits May 8, 2025 20:01

3rd draft

95d0621

4th draft

6f8a945

upup

e781084

style

4806660

up

0986e81

up

6a300f5

fix fn name

45e1680

update import structure for SkyReelsV2

c8a0c14

add SkyreelsV2 pipeline classes with backend requirements

47306b6

up

c5b8da9

up

5835eaa

add draft transformer_skyreels_v2.py with a custom WanModel and atten…

9d2880e

…tion mechanisms

up

2c0586e

split i2v and t2v pipes for diffusion forcing

52590ea

up

f318efa

Refactors the SkyReelsV2Transformer3DModel by removing unused metho…

9688a82

…ds and begin reorganizing the forward pass.

Refactors SkyReelsV2TransformerBlock to integrate its forward() m…

825c2c1

…ethod

Refactors SkyReelsV2AttnProcessor2_0 to enhance the forward() met…

d848500

…hod, integrating rotary embeddings and improving attention handling. Removes the deprecated `rope_apply` function and streamlines the attention mechanism for better integration and clarity.

Refactors SkyReelsV2Transformer3DModel to enhance the forward() m…

2f5a4e2

…ethod by updating parameter names for clarity, integrating attention masks, and improving the handling of encoder hidden states.

Refactors SkyReelsV2Transformer3DModel to improve the forward() m…

e5870dd

…ethod by enhancing the handling of time embeddings and encoder hidden states. Updates parameter names for clarity and integrates rotary embeddings, ensuring better compatibility with the model's architecture.

tolgacangoz added 3 commits May 28, 2025 21:52

Add example usage of PIL for image input in `SkyReelsV2DiffusionForci…

0f915f6

…ngImageToVideoPipeline` documentation.

Refactor SkyReelsV2DiffusionForcingPipeline to `SkyReelsV2Diffusion…

9a6746b

…ForcingVideoToVideoPipeline`, enhancing support for Video-to-Video (v2v) generation. Introduce video input handling, update latent preparation logic, and improve error handling for input parameters.

Refactor SkyReelsV2DiffusionForcingImageToVideoPipeline by removing…

b879963

… the `image_encoder` and `image_processor` dependencies. Update the CPU offload sequence accordingly.

tolgacangoz marked this pull request as draft May 29, 2025 08:07

tolgacangoz added 26 commits May 29, 2025 16:21

Refactor SkyReelsV2DiffusionForcingImageToVideoPipeline to enhance …

7f35894

…latent preparation logic and condition handling. Update image input type to `Optional`, streamline video condition processing, and improve handling of `last_image` during latent generation.

refactor

e2bfbfa

fix num_frames

594082e

fix prefix_video_latents

c4c9c0a

up

79960de

refactor

f6cd857

Fix typo in scheduler method call within `SkyReelsV2DiffusionForcingV…

3ce9b05

…ideoToVideoPipeline` to ensure proper noise scaling during latent generation.

up

f1b8508

Enhance SkyReelsV2DiffusionForcingImageToVideoPipeline by adding su…

aad0feb

…pport for `last_image` parameter and refining latent frame calculations. Update preprocessing logic.

add statistics

0958647

Refine latent frame handling in `SkyReelsV2DiffusionForcingImageToVid…

fcfc7f4

…eoPipeline` by correcting variable names and reintroducing latent mean and standard deviation calculations. Update logic for frame preparation and sampling to ensure accurate video generation.

up

b197ffb

refactor

54f1aa5

up

0a45793

Refactor SkyReelsV2DiffusionForcingVideoToVideoPipeline to improve …

37649b2

…latent handling by enforcing tensor input for video, updating frame preparation logic, and adjusting default frame count. Enhance preprocessing and postprocessing steps for better integration.

style

46c6e72

0edb263

fix vae output indexing

4d724df

upup

79dbd0e

22c761e

fbf5cc1

up

92c4e8c

Fix tensor concatenation and repetition logic in `SkyReelsV2Diffusion…

bb9ca6f

…ForcingImageToVideoPipeline` to ensure correct dimensionality for video conditions and latent conditions.

Refactor latent retrieval logic in `SkyReelsV2DiffusionForcingVideoTo…

18e525f

…VideoPipeline` to handle tensor dimensions more robustly, ensuring compatibility with both 3D and 4D video inputs.

Enhance logging in SkyReelsV2DiffusionForcing pipelines by adding i…

528a811

…teration print statements for better debugging. Clean up unused code related to prefix video latents length calculation in `SkyReelsV2DiffusionForcingImageToVideoPipeline`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SkyReels V2: Infinite-Length Film Generative Model #11518

Add SkyReels V2: Infinite-Length Film Generative Model #11518

tolgacangoz commented May 7, 2025 •

edited

Loading

Uh oh!

ukaprch commented May 8, 2025

Uh oh!

Uh oh!

Add SkyReels V2: Infinite-Length Film Generative Model #11518

Are you sure you want to change the base?

Add SkyReels V2: Infinite-Length Film Generative Model #11518

Conversation

tolgacangoz commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Original Work

T2V with Diffusion Forcing

Uh oh!

ukaprch commented May 8, 2025

Uh oh!

Uh oh!

tolgacangoz commented May 7, 2025 •

edited

Loading