[`Wan`] Fix VAE sampling mode in `WanVideoToVideoPipeline` #11639

tolgacangoz · 2025-05-30T09:27:35Z

While integrating SkyReels-V2 models, I came across this: Major Wan-related repos including Wan-Video/Wan2.1, modelscope/DiffSynth-Studio, and SkyworkAI/SkyReels-V2 prefer sample_mode == "argmax" for the encoding's output of their VAEs. Also, at the other Wan pipelines in diffusers, too. Am I correct?
Also, fixes a typo.

Input Video
hiker.mp4
Previous	Current
wan-v2v.mp4	wan-v2v-fixed.mp4

I am unsure if there is supposed to be a visible fix 🤔.

Reproducer

#!pip uninstall diffusers -yq
#!pip install git+https://github.com/tolgacangoz/diffusers.git@fix-wanv2v-vae ftfy -q
import torch
from diffusers.utils import load_video, export_to_video
from diffusers import AutoencoderKLWan, WanVideoToVideoPipeline, UniPCMultistepScheduler

# Available models: Wan-AI/Wan2.1-T2V-14B-Diffusers, Wan-AI/Wan2.1-T2V-1.3B-Diffusers
model_id = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(
    model_id, subfolder="vae", torch_dtype=torch.float32
)
pipe = WanVideoToVideoPipeline.from_pretrained(
    model_id, vae=vae, torch_dtype=torch.bfloat16
)
flow_shift = 3.0  # 5.0 for 720P, 3.0 for 480P
pipe.scheduler = UniPCMultistepScheduler.from_config(
    pipe.scheduler.config, flow_shift=flow_shift
)
#pipe.enable_model_cpu_offload()
pipe = pipe.to('cuda')

prompt = "A robot standing on a mountain top. The sun is setting in the background"
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
video = load_video(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4"
)
output = pipe(
    video=video,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=480,
    width=512,
    guidance_scale=7.0,
    strength=0.7,
    generator=torch.Generator(device="cuda").manual_seed(0),
).frames[0]

export_to_video(output, "wan-v2v.mp4", fps=16)

@DN6 @a-r-r-o-w

a-r-r-o-w

Looks correct to me because we use argmax in the other pipelines as well IIRC

ukaprch · 2025-05-30T12:37:32Z

In your "current" fix mp4 it appears that you have outliers "pixels" above / below the RGB threshold. I'm talking about those red spots and the light beam that moves across the video.

HuggingFaceDocBuilderDev · 2025-06-02T02:01:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tolgacangoz · 2025-06-02T04:44:35Z

In your "current" fix mp4 it appears that you have outliers "pixels" above / below the RGB threshold. I'm talking about those red spots and the light beam that moves across the video.

When I complete SkyReels-V2 integration, I will return to this PR. It is almost done.

tolgacangoz added 2 commits May 30, 2025 12:26

fix: vae sampling mode

6ee6b53

fix a typo

40243c8

tolgacangoz changed the title ~~Fix-wanv2v-vae~~ Fix VAE sampling mode in WanVideoToVideoPipeline May 30, 2025

tolgacangoz changed the title ~~Fix VAE sampling mode in WanVideoToVideoPipeline~~ [Wan] Fix VAE sampling mode in WanVideoToVideoPipeline May 30, 2025

a-r-r-o-w approved these changes May 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[`Wan`] Fix VAE sampling mode in `WanVideoToVideoPipeline` #11639

[`Wan`] Fix VAE sampling mode in `WanVideoToVideoPipeline` #11639

Uh oh!

tolgacangoz commented May 30, 2025 •

edited

Loading

Uh oh!

a-r-r-o-w left a comment

Uh oh!

ukaprch commented May 30, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 2, 2025

Uh oh!

tolgacangoz commented Jun 2, 2025

Uh oh!

Uh oh!

[Wan] Fix VAE sampling mode in WanVideoToVideoPipeline #11639

Are you sure you want to change the base?

[Wan] Fix VAE sampling mode in WanVideoToVideoPipeline #11639

Uh oh!

Conversation

tolgacangoz commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reproducer

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

ukaprch commented May 30, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 2, 2025

Uh oh!

tolgacangoz commented Jun 2, 2025

Uh oh!

Uh oh!

[`Wan`] Fix VAE sampling mode in `WanVideoToVideoPipeline` #11639

[`Wan`] Fix VAE sampling mode in `WanVideoToVideoPipeline` #11639

tolgacangoz commented May 30, 2025 •

edited

Loading