Skip to content

[Wan] Fix VAE sampling mode in WanVideoToVideoPipeline #11639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tolgacangoz
Copy link
Contributor

@tolgacangoz tolgacangoz commented May 30, 2025

While integrating SkyReels-V2 models, I came across this: Major Wan-related repos including Wan-Video/Wan2.1, modelscope/DiffSynth-Studio, and SkyworkAI/SkyReels-V2 prefer sample_mode == "argmax" for the encoding's output of their VAEs. Also, at the other Wan pipelines in diffusers, too. Am I correct?
Also, fixes a typo.

Input Video
hiker.mp4
Previous Current
wan-v2v.mp4
wan-v2v-fixed.mp4

I am unsure if there is supposed to be a visible fix 🤔.

Reproducer

#!pip uninstall diffusers -yq
#!pip install git+https://github.com/tolgacangoz/diffusers.git@fix-wanv2v-vae ftfy -q
import torch
from diffusers.utils import load_video, export_to_video
from diffusers import AutoencoderKLWan, WanVideoToVideoPipeline, UniPCMultistepScheduler

# Available models: Wan-AI/Wan2.1-T2V-14B-Diffusers, Wan-AI/Wan2.1-T2V-1.3B-Diffusers
model_id = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(
    model_id, subfolder="vae", torch_dtype=torch.float32
)
pipe = WanVideoToVideoPipeline.from_pretrained(
    model_id, vae=vae, torch_dtype=torch.bfloat16
)
flow_shift = 3.0  # 5.0 for 720P, 3.0 for 480P
pipe.scheduler = UniPCMultistepScheduler.from_config(
    pipe.scheduler.config, flow_shift=flow_shift
)
#pipe.enable_model_cpu_offload()
pipe = pipe.to('cuda')

prompt = "A robot standing on a mountain top. The sun is setting in the background"
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
video = load_video(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4"
)
output = pipe(
    video=video,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=480,
    width=512,
    guidance_scale=7.0,
    strength=0.7,
    generator=torch.Generator(device="cuda").manual_seed(0),
).frames[0]

export_to_video(output, "wan-v2v.mp4", fps=16)

@DN6 @a-r-r-o-w

@tolgacangoz tolgacangoz changed the title Fix-wanv2v-vae Fix VAE sampling mode in WanVideoToVideoPipeline May 30, 2025
@tolgacangoz tolgacangoz changed the title Fix VAE sampling mode in WanVideoToVideoPipeline [Wan] Fix VAE sampling mode in WanVideoToVideoPipeline May 30, 2025
Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct to me because we use argmax in the other pipelines as well IIRC

@ukaprch
Copy link

ukaprch commented May 30, 2025

In your "current" fix mp4 it appears that you have outliers "pixels" above / below the RGB threshold. I'm talking about those red spots and the light beam that moves across the video.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@tolgacangoz
Copy link
Contributor Author

In your "current" fix mp4 it appears that you have outliers "pixels" above / below the RGB threshold. I'm talking about those red spots and the light beam that moves across the video.

When I complete SkyReels-V2 integration, I will return to this PR. It is almost done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants