Skip to content

Commit bc2cbfb

Browse files
tolgacangozsayakpaul
authored andcommitted
[Community Pipeline] Add 🪆Matryoshka Diffusion Models (#9157)
1 parent 76d9796 commit bc2cbfb

File tree

2 files changed

+4697
-10
lines changed

2 files changed

+4697
-10
lines changed

examples/community/README.md

Lines changed: 56 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,8 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
7373
| Stable Diffusion BoxDiff Pipeline | Training-free controlled generation with bounding boxes using [BoxDiff](https://github.com/showlab/BoxDiff) | [Stable Diffusion BoxDiff Pipeline](#stable-diffusion-boxdiff) | - | [Jingyang Zhang](https://github.com/zjysteven/) |
7474
| FRESCO V2V Pipeline | Implementation of [[CVPR 2024] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation](https://arxiv.org/abs/2403.12962) | [FRESCO V2V Pipeline](#fresco) | - | [Yifan Zhou](https://github.com/SingleZombie) |
7575
| AnimateDiff IPEX Pipeline | Accelerate AnimateDiff inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [AnimateDiff on IPEX](#animatediff-on-ipex) | - | [Dan Li](https://github.com/ustcuna/) |
76-
| HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffsuion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). | [HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing) | [Monjoy Choudhury](https://github.com/MnCSSJ4x) |
76+
| HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffusion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). | [HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing) | [Monjoy Choudhury](https://github.com/MnCSSJ4x) |
77+
| [🪆Matryoshka Diffusion Models](https://huggingface.co/papers/2310.15111) | A diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small scale inputs are nested within those of the large scales. See [original codebase](https://github.com/apple/ml-mdm). | [🪆Matryoshka Diffusion Models](#matryoshka-diffusion-models) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/pcuenq/mdm) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/tolgacangoz/1f54875fc7aeaabcf284ebde64820966/matryoshka_hf.ipynb) | [M. Tolga Cangöz](https://github.com/tolgacangoz) |
7778

7879
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
7980

@@ -85,28 +86,28 @@ pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion
8586

8687
### Flux with CFG
8788

88-
Know more about Flux [here](https://blackforestlabs.ai/announcing-black-forest-labs/). Since Flux doesn't use CFG, this implementation provides one, inspired by the [PuLID Flux adaptation](https://github.com/ToTheBeginning/PuLID/blob/main/docs/pulid_for_flux.md).
89+
Know more about Flux [here](https://blackforestlabs.ai/announcing-black-forest-labs/). Since Flux doesn't use CFG, this implementation provides one, inspired by the [PuLID Flux adaptation](https://github.com/ToTheBeginning/PuLID/blob/main/docs/pulid_for_flux.md).
8990

9091
Example usage:
9192

9293
```py
9394
from diffusers import DiffusionPipeline
94-
import torch
95+
import torch
9596

9697
pipeline = DiffusionPipeline.from_pretrained(
97-
"black-forest-labs/FLUX.1-dev",
98-
torch_dtype=torch.bfloat16,
98+
"black-forest-labs/FLUX.1-dev",
99+
torch_dtype=torch.bfloat16,
99100
custom_pipeline="pipeline_flux_with_cfg"
100101
)
101102
pipeline.enable_model_cpu_offload()
102103
prompt = "a watercolor painting of a unicorn"
103104
negative_prompt = "pink"
104105

105106
img = pipeline(
106-
prompt=prompt,
107-
negative_prompt=negative_prompt,
108-
true_cfg=1.5,
109-
guidance_scale=3.5,
107+
prompt=prompt,
108+
negative_prompt=negative_prompt,
109+
true_cfg=1.5,
110+
guidance_scale=3.5,
110111
num_images_per_prompt=1,
111112
generator=torch.manual_seed(0)
112113
).images[0]
@@ -2656,7 +2657,7 @@ image with mask mech_painted.png
26562657

26572658
<img src=https://github.com/noskill/diffusers/assets/733626/c334466a-67fe-4377-9ff7-f46021b9c224 width="25%" >
26582659

2659-
result:
2660+
result:
26602661

26612662
<img src=https://github.com/noskill/diffusers/assets/733626/5043fb57-a785-4606-a5ba-a36704f7cb42 width="25%" >
26622663

@@ -4324,6 +4325,51 @@ image = pipe(
43244325

43254326
A colab notebook demonstrating all results can be found [here](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing). Depth Maps have also been added in the same colab.
43264327

4328+
### 🪆Matryoshka Diffusion Models
4329+
4330+
![🪆Matryoshka Diffusion Models](https://github.com/user-attachments/assets/bf90b53b-48c3-4769-a805-d9dfe4a7c572)
4331+
4332+
The Abstract of the paper:
4333+
>Diffusion models are the _de-facto_ approach for generating high-quality images and videos but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space, or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion (MDM), **a novel framework for high-resolution image and video synthesis**. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a **NestedUNet** architecture where features and parameters for small scale inputs are nested within those of the large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a **_single pixel-space model_ at resolutions of up to 1024 × 1024 pixels**, demonstrating strong zero shot generalization using the **CC12M dataset, which contains only 12 million images**. Code and pre-trained checkpoints are released at https://github.com/apple/ml-mdm.
4334+
4335+
- `64×64, nesting_level=0`: 1.719 GiB. With `50` DDIM inference steps:
4336+
4337+
**64x64**
4338+
:-------------------------:
4339+
| <img src="https://github.com/user-attachments/assets/9e7bb2cd-45a0-4bd1-adb8-23e283baed39" width="222" height="222" alt="bird_64"> |
4340+
4341+
- `256×256, nesting_level=1`: 1.776 GiB. With `150` DDIM inference steps:
4342+
4343+
**64x64** | **256x256**
4344+
:-------------------------:|:-------------------------:
4345+
| <img src="https://github.com/user-attachments/assets/6b724c2e-5e6a-4b63-9b65-c1182cbb67e0" width="222" height="222" alt="64x64"> | <img src="https://github.com/user-attachments/assets/7dbab2ad-bf40-4a73-ab04-f178347cb7d5" width="222" height="222" alt="256x256"> |
4346+
4347+
- `1024×1024, nesting_level=2`: 1.792 GiB. As one can realize the cost of adding another layer is really negligible. With `250` DDIM inference steps:
4348+
4349+
**64x64** | **256x256** | **1024x1024**
4350+
:-------------------------:|:-------------------------:|:-------------------------:
4351+
| <img src="https://github.com/user-attachments/assets/4a9454e4-e20a-4736-a196-270e2ae796c0" width="222" height="222" alt="64x64"> | <img src="https://github.com/user-attachments/assets/4a96555d-0fda-4303-82b1-a4d886f770b9" width="222" height="222" alt="256x256"> | <img src="https://github.com/user-attachments/assets/e0239b7a-ab73-4d45-8f3e-b4e6b4b50abe" width="222" height="222" alt="1024x1024"> |
4352+
4353+
```py
4354+
from diffusers import DiffusionPipeline
4355+
from diffusers.utils import make_image_grid
4356+
4357+
# nesting_level=0 -> 64x64; nesting_level=1 -> 256x256 - 64x64; nesting_level=2 -> 1024x1024 - 256x256 - 64x64
4358+
pipe = DiffusionPipeline.from_pretrained("tolgacangoz/matryoshka-diffusion-models",
4359+
nesting_level=0,
4360+
trust_remote_code=False, # One needs to give permission for this code to run
4361+
).to("cuda")
4362+
4363+
prompt0 = "a blue jay stops on the top of a helmet of Japanese samurai, background with sakura tree"
4364+
prompt = f"breathtaking {prompt0}. award-winning, professional, highly detailed"
4365+
negative_prompt = "deformed, mutated, ugly, disfigured, blur, blurry, noise, noisy"
4366+
image = pipe(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=50).images
4367+
make_image_grid(image, rows=1, cols=len(image))
4368+
4369+
# pipe.change_nesting_level(<int>) # 0, 1, or 2
4370+
# 50+, 100+, and 250+ num_inference_steps are recommended for nesting levels 0, 1, and 2 respectively.
4371+
```
4372+
43274373
# Perturbed-Attention Guidance
43284374

43294375
[Project](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) / [arXiv](https://arxiv.org/abs/2403.17377) / [GitHub](https://github.com/KU-CVLAB/Perturbed-Attention-Guidance)

0 commit comments

Comments
 (0)