Skip to content

Add SkyReels V2: Infinite-Length Film Generative Model #11518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 145 commits into
base: main
Choose a base branch
from

Conversation

tolgacangoz
Copy link
Contributor

@tolgacangoz tolgacangoz commented May 7, 2025

Thanks for the opportunity to fix #11374!

Original repo: https://github.com/SkyworkAI/SkyReels-V2
Paper: https://huggingface.co/papers/2504.13074

SkyReels V2's main contributions are summarized as follow:
• Comprehensive video captioner that understand the shot language while capturing the general description of the video, which dramatically improve the prompt adherence.
• Motion-specific preference optimization enhances motion dynamics with a semi-automatic data collection pipeline.
• Effective Diffusion-forcing adaptation enables the generation of ultra-long videos and story generation capabilities, providing a robust framework for extending temporal coherence and narrative depth.
• SkyCaptioner-V1 and SkyReels-V2 series models including diffusion-forcing, text2video, image2video, camera director and elements2video models with various sizes (1.3B, 5B, 14B) are open-sourced.

main_pipeline

TODOs:
FlowMatchUniPCMultistepScheduler: just copy-pasted from the original repo
SkyReelsV2Transformer3DModel: 90% WanTransformer3DModel
SkyReelsV2DiffusionForcingPipeline

SkyReelsV2DiffusionForcingImageToVideoPipeline: Includes Start/End Frame Control.
SkyReelsV2DiffusionForcingVideoToVideoPipeline: Extends a given video.
SkyReelsV2Pipeline
SkyReelsV2ImageToVideoPipeline
scripts/convert_skyreelsv2_to_diffusers.py
⬜ Did you make sure to update the documentation with your changes?
⬜ Did you write any new necessary tests?

prompt="A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring
		flour, while the dog is stirring the batter with a wooden spoon. The kitchen is
		cozy, with sunlight streaming through the window."
num_inference_steps=30,
height=544,
width=960,
guidance_scale=6.0,
num_frames=97 or 257,
ar_step=5,
causal_block_size=5,
generator=torch.Generator(device="cuda").manual_seed(0 or 37),
overlap_history=None for short; 17 for long,
addnoise_condition=20
Skywork/SkyReels-V2-DF-1.3B-540P
seed 0 and num_frames 97
Original repo diffusers integration
original_0_short.mp4
diffusers_0_short.mp4
seed 37 and num_frames 97
Original repo diffusers integration
original_37_short.mp4
diffusers_37_short.mp4
seed 0 and num_frames 257
Original repo diffusers integration
original_0_long.mp4
diffusers_0_long.mp4
seed 37 and num_frames 257
Original repo diffusers integration
original_37_long.mp4
diffusers_37_long.mp4

Open In Colab

Firstly, I want to congratulate you on this great work, and thanks for open-sourcing it, SkyReels Team! This PR attempted to integrate your model.
Now, this PR is ready for review for SkyReelsV2Transformer3DModel and SkyReelsV2DiffusionForcingPipeline. Other pipelines will be incoming right after the first feedback...

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@yiyixuxu @a-r-r-o-w @linoytsaban @yjp999 @Howe2018 @RoseRollZhu @pftq @Langdx @guibinchen @qiudi0127 @nitinmukesh @tin2tin @ukaprch @okaris

tolgacangoz and others added 7 commits May 7, 2025 21:53
…usion forcing

- Introduced the drafts of `SkyReelsV2TextToVideoPipeline`, `SkyReelsV2ImageToVideoPipeline`, `SkyReelsV2DiffusionForcingPipeline`, and `FlowUniPCMultistepScheduler`.
@ukaprch
Copy link

ukaprch commented May 8, 2025

It's about time. Thanks.

tolgacangoz added 22 commits May 8, 2025 20:01
Replaces custom attention implementations with `SkyReelsV2AttnProcessor2_0` and the standard `Attention` module.
Updates `WanAttentionBlock` to use `FP32LayerNorm` and `FeedForward`.
Removes the `model_type` parameter, simplifying model architecture and attention block initialization.
Introduces new classes `SkyReelsV2ImageEmbedding` and `SkyReelsV2TimeTextImageEmbedding` for enhanced image and time-text processing. Refactors the `SkyReelsV2Transformer3DModel` to integrate these embeddings, updating the constructor parameters for better clarity and functionality. Removes unused classes and methods to streamline the codebase.
…ds and begin reorganizing the forward pass.
…hod, integrating rotary embeddings and improving attention handling. Removes the deprecated `rope_apply` function and streamlines the attention mechanism for better integration and clarity.
…ethod by updating parameter names for clarity, integrating attention masks, and improving the handling of encoder hidden states.
…ethod by enhancing the handling of time embeddings and encoder hidden states. Updates parameter names for clarity and integrates rotary embeddings, ensuring better compatibility with the model's architecture.
tolgacangoz and others added 21 commits May 25, 2025 18:41
…itialization to directly assign the list of SkyReelsV2 components.
…ys convert query, key, and value to `torch.bfloat16`, simplifying the code and improving clarity.
…by adding VAE initialization and detailed prompt for video generation, improving clarity and usability of the documentation.
…and improve formatting in `pipeline_skyreels_v2_diffusion_forcing.py` to enhance code readability and maintainability.
…ine` from 5.0 to 6.0 to enhance video generation quality.
…definition of `SkyReelsV2DiffusionForcingPipeline` to ensure consistency and improve video generation quality.
…odel` to *ensure* correct tensor operations.
…peat_interleave` for improved efficiency in `SkyReelsV2Transformer3DModel`.
… with guidance scale and shift parameters for T2V and I2V. Remove unused `retrieve_latents` function to streamline the code.
…line` to use `deepcopy` for improved state management during inference steps.
@tolgacangoz tolgacangoz marked this pull request as ready for review May 27, 2025 06:06
…ngPipeline` for `overlap_history` and `addnoise_condition` parameters to improve long video generation guidance.
…nForcingPipeline` to clarify asynchronous inference settings and improve progress tracking during denoising steps.
Comment on lines 691 to 697
# 6. Denoising loop
num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
self._num_timesteps = len(step_matrix)
progress_bar_step = len(timesteps) / len(step_matrix)

with self.progress_bar(total=num_inference_steps) as progress_bar:
for i, t in enumerate(step_matrix):
Copy link
Contributor Author

@tolgacangoz tolgacangoz May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we set inference_steps=30 in asynchronous mode (by setting e.g., ar_step=5), the original repository displays 50 steps via its tqdm -step_matrix includes 50 elements; i.e., requires more steps. I integrated this with progress_bar_step = len(timesteps) / len(step_matrix). The user sees 30 steps moving by decimal 0.6 at each step, while there are actually 50. WDYT?

Copy link
Contributor Author

@tolgacangoz tolgacangoz May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a second thought, this might be confusing. Should I arrange with self.progress_bar(total=len(step_matrix)) as progress_bar: so that the user would see the real number of steps directly? And decimal stepping in the progress bar probably isn't something common. The difference between synchronous and asynchronous inferences can be explained in the documentation.

@tolgacangoz tolgacangoz marked this pull request as ready for review May 27, 2025 09:58
…e` by rounding the step size to one decimal place for improved readability during denoising steps.
…mentation for improved clarity and organization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] Integrate SkyReels-V2 support in diffusers
3 participants