Skip to content

stable-diffusion: TAESD implementation - faster autoencoder #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Dec 5, 2023

Conversation

FSSRepo
Copy link
Contributor

@FSSRepo FSSRepo commented Nov 26, 2023

Fixes #36

This is a quick implementation, so I may have overlooked something. Initially, I thought about implementing it in a separate header, but there were some issues due to dependencies collisions with ggml, and I honestly didn't want to bother solving that. Therefore, I directly added it to the stable-diffusion.cpp file.

Results

Props AutoEncoder KL TinyAutoEncoder
Test 1 output_akl output_taesd
Test 2 output_kl output_tae
Processing Time (CPU Backend) 24 seconds 1.8 seconds
Params memory 95 MB 2 MB
Compute memory (512 x 512) 1664 MB 416 MB

How to use it:

Just add --taesd TAESD_MODEL_PATH to command line, just works for txt2img for now:

git clone https://github.com/FSSRepo/stable-diffusion.cpp.git
mkdir build
cd build
cmake ..
cmake --build . --config Release

sd -m model.gguf -p "a lovely cat" --taesd taesd-model.gguf

Tasks:

  • Implement the unary operation ggml_tanh in CUDA, for full offloading support.
  • Complete the implementation of the encoder and perform tests.

@Green-Sky
Copy link
Contributor

just add --taesd to command line

is this added to the gguf?, if its replacing the normal vae, then this should be inferred from the gguf file.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Nov 26, 2023

@Green-Sky it's a separated model, I included taesd-decoder.gguf it should be in your working path. This does not replace the original VAE; when taesd is used, the original VAE is ignored, reducing the amount of memory to be used.

@Green-Sky
Copy link
Contributor

Green-Sky commented Nov 26, 2023

testing with lcm 4 steps:

taesd

output

[INFO]  stable-diffusion.cpp:5430 - sampling completed, taking 17.13s
[INFO]  stable-diffusion.cpp:5448 - latent 1 decoded, taking 1.44s

vae

⚠️ produces invalid image file ⚠️

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Nov 26, 2023

⚠️ produces invalid image file ⚠️

I don't understand

@Green-Sky
Copy link
Contributor

Green-Sky commented Nov 26, 2023

⚠️ produces invalid image file ⚠️

I don't understand

ok, running without taesd AND with lcm lora AND with eulerA sampler (so not lcm sampler) produces invalid files. running more tests...

@Green-Sky
Copy link
Contributor

Green-Sky commented Nov 26, 2023

works:

$ bin/sd -m ../models/epicphotogasm_lastUnicorn-f16.gguf --lora-model-dir ../models/loras/sd1-gguf/ -p "<lora:lcm-lora-sdv1-5:1>a lovely cat" --steps 4 --cfg-scale 1.0 --sampling-method lcm
$ bin/sd -m ../models/epicphotogasm_lastUnicorn-f16.gguf --lora-model-dir ../models/loras/sd1-gguf/ -p "<lora:lcm-lora-sdv1-5:1>a lovely cat" --steps 4 --cfg-scale 1.0 --sampling-method euler_a --taesd

fails:

$ bin/sd -m ../models/epicphotogasm_lastUnicorn-f16.gguf --lora-model-dir ../models/loras/sd1-gguf/ -p "<lora:lcm-lora-sdv1-5:1>a lovely cat" --steps 4 --cfg-scale 1.0 --sampling-method euler_a
$ bin/sd -m ../models/epicphotogasm_lastUnicorn-f16.gguf --lora-model-dir ../models/loras/sd1-gguf/ -p "<lora:lcm-lora-sdv1-5:1>a lovely cat" --steps 4 --cfg-scale 1.0 --sampling-method euler

edit: invalid png
output

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Nov 26, 2023

@Green-Sky I have conducted some tests, and I am not getting any errors related to the regular VAE or the sampling. Does master have this error?

@Green-Sky
Copy link
Contributor

using a different, non-lora-lcm model, it does not produce an invalid file
$ bin/sd -m ./LCM_Dreamshaper_v7-f16.gguf -p "a lovely cat" --steps 4 --cfg-scale 1.0 --sampling-method euler_a

@Green-Sky I have conducted some tests, and I am not getting any errors related to the regular VAE or the sampling. Does master have this error?

yes, it appears this also happens on master.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Nov 26, 2023

yes, it appears this also happens on master.

With the last commit before the latest commit of master

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Nov 26, 2023

build\bin\Release\sd -m kotosmix_v10-f16.gguf -p "<lora:Kana_Arima-10:0.9><lora:lcm-lora:1>beautiful anime girl, short hair, red hair, red eyes, realistic, masterpiece, azur lane, 4k, high quality" --sampling-method lcm --cfg-scale 1 --steps 5 -t 6 -s 424354
Model AutoEncoderKL TAESD
kotosmix_v10-f16.gguf output output
AnythingV5_v5PrtRE-f16.gguf output output

This is definitely not an error. It is well known that LCM Lora only works with the LCM sampler; with another one, it starts to give results with degradation and malformations.

@Green-Sky
Copy link
Contributor

yes, it appears this also happens on master.

With the last commit before the latest commit of master

not sure that can be indicative, since that is pre gguf -> maybe different model file


on master with the seed 1337 (instead of 42) it actually crashes, so i ran it in debug

sd: /home/green/workspace/stable-diffusion.cpp/common/./stb_image_write.h:1226: unsigned char* stbi_write_png_to_mem(const unsigned char*, int, int, int, int, int*, const char*): Assertion `o == out + *out_len' failed.
Aborted (core dumped)

@Green-Sky
Copy link
Contributor

Green-Sky commented Nov 27, 2023

I switched back to this branch and now it works in release mode. BUT in debug, it triggers the assert(). ( @leejet there is a bug in master )

sorry for spamming unrelated comments in your taesd pr 👀

edit: ah yes, now it generates invalid files with the lcm sampler and taesd 😵‍💫

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Nov 27, 2023

When I enable Debug i get:

Assertion failed: node->src[j]->backend == GGML_BACKEND_GPU, file C:\proyectos\stable-diffusion.cpp\ggml\src\ggml-cuda.cu, line 8738

Only when I use --taesd, but I already know why it happen.

@Jonathhhan
Copy link

Wow, very fast. Maybe it would be nice, if it is possible to set the .gguf file path in stable-diffusion.h (like with the lora files)?

@leejet
Copy link
Owner

leejet commented Nov 27, 2023

Wow, very fast. Maybe it would be nice, if it is possible to set the .gguf file path in stable-diffusion.h (like with the lora files)?

I also think it would be better to specify the path of the tae model through parameters rather than hardcode.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Nov 27, 2023

Now the TAE model should pass like this --taesd TAE_MODEL_PATH, if you pass it, it will be used:

build\bin\Release\sd -m AnythingV5_v5PrtRE-f16.gguf --taesd taesd-model.gguf -p "<lora:Kana_Arima-10:0.9><lora:lcm-lora:1>beautiful anime girl, short hair, red hair, red eyes, realistic, masterpiece, azur lane, 4k, high quality" --sampling-method lcm --cfg-scale 1 --steps 5 -t 1 -s 424354

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Nov 27, 2023

@leejet The TinyAutoEncoder's encoder is quite bad; I'll have to run some tests with the original Python implementation. I tried passing the init_latent generated by the encoder directly to the decoder, and I get dark results, whereas with AutoEncoderKL, I get the original image.

@leejet
Copy link
Owner

leejet commented Nov 27, 2023

I tried passing the init_latent generated by the encoder directly to the decoder, and I get dark results.

Are you using the original python implementation?

@leejet
Copy link
Owner

leejet commented Nov 27, 2023

I suggest that you compare the original Python version to see if the problem is with the model itself or your implementation

@Amin456789
Copy link

Amin456789 commented Nov 27, 2023

having this working with lcm-lora will be a huge speed boost, thanks!

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 4, 2023

@leejet So I should specify that in README.md and delete the existing gguf file here.

I will try to make them compatible.

I think it will be cumbersome to support the gguf format and handle discrepancies between the safetensors of taesd. It's better to maintain compatibility with the safetensor that already contains all the tensors from the encoder and decoder. This should be specified in the README.md file.

@Green-Sky GGUF-formatted LORAs will no longer be accepted due to naming differences, and correcting that seems cumbersome to me. Only SafeTensors and CKPT formats will be accepted from now on. In short, you can delete them as they are now obsolete.

@leejet
Copy link
Owner

leejet commented Dec 4, 2023

I'm using this one, https://huggingface.co/madebyollin/taesd/blob/main/diffusion_pytorch_model.safetensors, because this weight contains both the encoder and decoder, and it doesn't take up much storage space.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 4, 2023

@leejet Leaving 10 MB of padding in cal_mem_size arbitrarily seems excessive to me.

@leejet
Copy link
Owner

leejet commented Dec 4, 2023

The names of the tensors in https://huggingface.co/madebyollin/taesd/blob/main/diffusion_pytorch_model.safetensors.

decoder.layers.0.bias
decoder.layers.0.weight
decoder.layers.11.weight
decoder.layers.12.conv.0.bias
decoder.layers.12.conv.0.weight
decoder.layers.12.conv.2.bias
decoder.layers.12.conv.2.weight
decoder.layers.12.conv.4.bias
decoder.layers.12.conv.4.weight
decoder.layers.13.conv.0.bias
decoder.layers.13.conv.0.weight
decoder.layers.13.conv.2.bias
decoder.layers.13.conv.2.weight
decoder.layers.13.conv.4.bias
decoder.layers.13.conv.4.weight
decoder.layers.14.conv.0.bias
decoder.layers.14.conv.0.weight
decoder.layers.14.conv.2.bias
decoder.layers.14.conv.2.weight
decoder.layers.14.conv.4.bias
decoder.layers.14.conv.4.weight
decoder.layers.16.weight
decoder.layers.17.conv.0.bias
decoder.layers.17.conv.0.weight
decoder.layers.17.conv.2.bias
decoder.layers.17.conv.2.weight
decoder.layers.17.conv.4.bias
decoder.layers.17.conv.4.weight
decoder.layers.18.bias
decoder.layers.18.weight
decoder.layers.2.conv.0.bias
decoder.layers.2.conv.0.weight
decoder.layers.2.conv.2.bias
decoder.layers.2.conv.2.weight
decoder.layers.2.conv.4.bias
decoder.layers.2.conv.4.weight
decoder.layers.3.conv.0.bias
decoder.layers.3.conv.0.weight
decoder.layers.3.conv.2.bias
decoder.layers.3.conv.2.weight
decoder.layers.3.conv.4.bias
decoder.layers.3.conv.4.weight
decoder.layers.4.conv.0.bias
decoder.layers.4.conv.0.weight
decoder.layers.4.conv.2.bias
decoder.layers.4.conv.2.weight
decoder.layers.4.conv.4.bias
decoder.layers.4.conv.4.weight
decoder.layers.6.weight
decoder.layers.7.conv.0.bias
decoder.layers.7.conv.0.weight
decoder.layers.7.conv.2.bias
decoder.layers.7.conv.2.weight
decoder.layers.7.conv.4.bias
decoder.layers.7.conv.4.weight
decoder.layers.8.conv.0.bias
decoder.layers.8.conv.0.weight
decoder.layers.8.conv.2.bias
decoder.layers.8.conv.2.weight
decoder.layers.8.conv.4.bias
decoder.layers.8.conv.4.weight
decoder.layers.9.conv.0.bias
decoder.layers.9.conv.0.weight
decoder.layers.9.conv.2.bias
decoder.layers.9.conv.2.weight
decoder.layers.9.conv.4.bias
decoder.layers.9.conv.4.weight
encoder.layers.0.bias
encoder.layers.0.weight
encoder.layers.1.conv.0.bias
encoder.layers.1.conv.0.weight
encoder.layers.1.conv.2.bias
encoder.layers.1.conv.2.weight
encoder.layers.1.conv.4.bias
encoder.layers.1.conv.4.weight
encoder.layers.10.weight
encoder.layers.11.conv.0.bias
encoder.layers.11.conv.0.weight
encoder.layers.11.conv.2.bias
encoder.layers.11.conv.2.weight
encoder.layers.11.conv.4.bias
encoder.layers.11.conv.4.weight
encoder.layers.12.conv.0.bias
encoder.layers.12.conv.0.weight
encoder.layers.12.conv.2.bias
encoder.layers.12.conv.2.weight
encoder.layers.12.conv.4.bias
encoder.layers.12.conv.4.weight
encoder.layers.13.conv.0.bias
encoder.layers.13.conv.0.weight
encoder.layers.13.conv.2.bias
encoder.layers.13.conv.2.weight
encoder.layers.13.conv.4.bias
encoder.layers.13.conv.4.weight
encoder.layers.14.bias
encoder.layers.14.weight
encoder.layers.2.weight
encoder.layers.3.conv.0.bias
encoder.layers.3.conv.0.weight
encoder.layers.3.conv.2.bias
encoder.layers.3.conv.2.weight
encoder.layers.3.conv.4.bias
encoder.layers.3.conv.4.weight
encoder.layers.4.conv.0.bias
encoder.layers.4.conv.0.weight
encoder.layers.4.conv.2.bias
encoder.layers.4.conv.2.weight
encoder.layers.4.conv.4.bias
encoder.layers.4.conv.4.weight
encoder.layers.5.conv.0.bias
encoder.layers.5.conv.0.weight
encoder.layers.5.conv.2.bias
encoder.layers.5.conv.2.weight
encoder.layers.5.conv.4.bias
encoder.layers.5.conv.4.weight
encoder.layers.6.weight
encoder.layers.7.conv.0.bias
encoder.layers.7.conv.0.weight
encoder.layers.7.conv.2.bias
encoder.layers.7.conv.2.weight
encoder.layers.7.conv.4.bias
encoder.layers.7.conv.4.weight
encoder.layers.8.conv.0.bias
encoder.layers.8.conv.0.weight
encoder.layers.8.conv.2.bias
encoder.layers.8.conv.2.weight
encoder.layers.8.conv.4.bias
encoder.layers.8.conv.4.weight
encoder.layers.9.conv.0.bias
encoder.layers.9.conv.0.weight
encoder.layers.9.conv.2.bias
encoder.layers.9.conv.2.weight
encoder.layers.9.conv.4.bias
encoder.layers.9.conv.4.weight

The names of the tensors in https://huggingface.co/madebyollin/taesd/blob/main/taesd_decoder.safetensors.

1.bias
1.weight
10.conv.0.bias
10.conv.0.weight
10.conv.2.bias
10.conv.2.weight
10.conv.4.bias
10.conv.4.weight
12.weight
13.conv.0.bias
13.conv.0.weight
13.conv.2.bias
13.conv.2.weight
13.conv.4.bias
13.conv.4.weight
14.conv.0.bias
14.conv.0.weight
14.conv.2.bias
14.conv.2.weight
14.conv.4.bias
14.conv.4.weight
15.conv.0.bias
15.conv.0.weight
15.conv.2.bias
15.conv.2.weight
15.conv.4.bias
15.conv.4.weight
17.weight
18.conv.0.bias
18.conv.0.weight
18.conv.2.bias
18.conv.2.weight
18.conv.4.bias
18.conv.4.weight
19.bias
19.weight
3.conv.0.bias
3.conv.0.weight
3.conv.2.bias
3.conv.2.weight
3.conv.4.bias
3.conv.4.weight
4.conv.0.bias
4.conv.0.weight
4.conv.2.bias
4.conv.2.weight
4.conv.4.bias
4.conv.4.weight
5.conv.0.bias
5.conv.0.weight
5.conv.2.bias
5.conv.2.weight
5.conv.4.bias
5.conv.4.weight
7.weight
8.conv.0.bias
8.conv.0.weight
8.conv.2.bias
8.conv.2.weight
8.conv.4.bias
8.conv.4.weight
9.conv.0.bias
9.conv.0.weight
9.conv.2.bias
9.conv.2.weight
9.conv.4.bias
9.conv.4.weight

They use different indices, so we have to choose one to make compatible.

@leejet
Copy link
Owner

leejet commented Dec 4, 2023

@leejet Leaving 10 MB of padding in cal_mem_size arbitrarily seems excessive to me.

This is the test code that was left behind initially. I'll remove it.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 4, 2023

It's better to leave compatibility only to the safetensor that comes with everything, the first.

I'm making some changes, wait a little bit please

@leejet
Copy link
Owner

leejet commented Dec 4, 2023

OK, I'll submit it when you're done.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 4, 2023

@leejet Try creating a function to accurately calculate the memory usage of LoRA parameters.

For TAESD, we will only support the safetensors, which already includes all tensors from both the encoder and decoder for simplicity.

@Cyberhan123
Copy link
Contributor

Cyberhan123 commented Dec 4, 2023

@FSSRepo It would be great if you could take the time to upgrade ggml, so that I can remove my pile of cmake that supports rocm, and also I can test the support for rocm. Because of ignorance, I bought 7900xtx .I have since moved towards support Rocm's and HIP's point of no return

@Cyberhan123
Copy link
Contributor

@FSSRepo It would be great if you could take the time to upgrade ggml, so that I can remove my pile of cmake that supports rocm, and also I can test the support for rocm. Because of ignorance, I bought 7900xtx .I have since moved towards support Rocm's and HIP's point of no return

Although rocm is very rubbish on Windows, it is better than nothing.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 4, 2023

@Cyberhan123 I'm waiting for my pull request that adds the necessary functions for stable diffusion to be merged into the original repository.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 4, 2023

@leejet Ready, you can add the final changes and test stable diffusion 2.1 again with cuda.

@leejet
Copy link
Owner

leejet commented Dec 4, 2023

@leejet Try creating a function to accurately calculate the memory usage of LoRA parameters.

It's been fixed now.

@leejet
Copy link
Owner

leejet commented Dec 4, 2023

@Cyberhan123 I'm waiting for my pull request that adds the necessary functions for stable diffusion to be merged into the original repository.

This pr will be merged soon, after I finish testing the necessary features, probably today or tomorrow.

@Cyberhan123
Copy link
Contributor

Sorry, This may be caused by my misunderstanding of Chinese, English and Google Translate. What I mean is to merge the main branch of ggml,I need this change: ggml-org/ggml#626

@Cyberhan123
Copy link
Contributor

@FSSRepo @leejet I'm really sorry. I really need to improve my English.

@Jonathhhan
Copy link

Jonathhhan commented Dec 5, 2023

I tried the newest version and now I have the issue that ggml_type is not declared in stable-diffusion.h. Which file do I need to include for declaring it? (I try to run it as a library(which was working before the update), so probably the original is working).
With Linux: error: ‘ggml_type’ has not been declared.

Edit:
Adding this to stable-diffusion.h works for me.

enum ggml_type {
    GGML_TYPE_F32 = 0,
    GGML_TYPE_F16 = 1,
    GGML_TYPE_Q4_0 = 2,
    GGML_TYPE_Q4_1 = 3,
    // GGML_TYPE_Q4_2 = 4, support has been removed
    // GGML_TYPE_Q4_3 (5) support has been removed
    GGML_TYPE_Q5_0 = 6,
    GGML_TYPE_Q5_1 = 7,
    GGML_TYPE_Q8_0 = 8,
    GGML_TYPE_Q8_1 = 9,
    // k-quantizations
    GGML_TYPE_Q2_K = 10,
    GGML_TYPE_Q3_K = 11,
    GGML_TYPE_Q4_K = 12,
    GGML_TYPE_Q5_K = 13,
    GGML_TYPE_Q6_K = 14,
    GGML_TYPE_Q8_K = 15,
    GGML_TYPE_I8,
    GGML_TYPE_I16,
    GGML_TYPE_I32,
    GGML_TYPE_COUNT,
};

@leejet
Copy link
Owner

leejet commented Dec 5, 2023

Because ggml_type is currently introduced, and I was thinking about how to provide a pure c api that hides the details of ggml.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 5, 2023

@leejet Later on, we will need to refactor the code to eliminate a large portion of duplicated code and make the use of the stable diffusion API more flexible. For this reason, I proposed doing something similar to what we did with llama.cpp and whisper.cpp, natively supporting a C API.

@leejet
Copy link
Owner

leejet commented Dec 5, 2023

@leejet Later on, we will need to refactor the code to eliminate a large portion of duplicated code and make the use of the stable diffusion API more flexible. For this reason, I proposed doing something similar to what we did with llama.cpp and whisper.cpp, natively supporting a C API.

That's what I want to do.

@leejet leejet merged commit 134883a into leejet:master Dec 5, 2023
@leejet
Copy link
Owner

leejet commented Dec 5, 2023

@FSSRepo This pr has been merged, thanks for your contribution! You've done an amazing job.

@Jonathhhan
Copy link

Also a big thanks from me. An additional note: The TAESD model needs to be in the same folder as the stable diffusion model, which was not the case before.

@FSSRepo FSSRepo deleted the taesd-impl branch December 5, 2023 14:47
@leejet
Copy link
Owner

leejet commented Dec 5, 2023

The TAESD model needs to be in the same folder as the stable diffusion model, which was not the case before.

I didn't find this limitation, can you post your command parameters?

@Jonathhhan
Copy link

Jonathhhan commented Dec 5, 2023

@leejet

I didn't find this limitation, can you post your command parameters?

Sorry, I made a mistake. It works. Maybe it would be nice to still have the possibility to set the log level in stable-diffusion.h?

@leejet
Copy link
Owner

leejet commented Dec 5, 2023

This function is currently in util.h, but a unified header will be provided later to provide the associated api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] taesd VAE (distilled VAE)
6 participants