Works like a charm. Thanks for this! #20
Replies: 2 comments 2 replies
-
Hey, @pinballelectronica, glad you're getting some use out of the node. Please star if you haven't already and are so inclined. So, in my testing (limited, but done with various resolutions and durations on a single 3090) that it is overall Pixel Load that matters. This is the data I have with your data point thrown in: 720x480x200 = 69,120,000 ~ 13Gigs of VRAM latent space at <10 minutes (I put it at 9.5). I assumed 16 steps but if yours was not FastHunyuan I will need to recalibrate the graph: So I'd say you are most definitely faster than my comparable 3090 times with just Sage Attn (my 3090 would come in around 12.5 minutes by the looks of it) and that if you aren't using TeaCache already you might want to give it a try. If you have, then your times surprise me a bit. Windows or Linux if I may ask? |
Beta Was this translation helpful? Give feedback.
-
Was thinking about filing an issue for this and then realized discussions were open. I just wanted to post to say thank you as well! This is a phenomenal piece of software. I have AMD GPUs from two different generations with different amounts of VRAM, and this (plus the much better swapping behavior for huge GGUF models) makes running large models possible which I never could have before without doing two-step workflows of generating a latent, saving it, and then bringing it back into a separate workflow. I've switched over most of what I do now to using workflows with these nodes, especially UnetLoaderGGUFAdvancedDisTorchMultiGPU. So, yeah, just wanted to say thank you! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Just extending a thanks for the work and the sweet collaboration with the gods of quantization. Runs beautifully with Sage Attn on 2x 4090's in various configs, with regular lora loaders in ComfyUI (loras created using Diffusion Pipe). Very fast. 720x480x200 frames will complete in < 10 mins. Trying to understand if I should have better output if I size the video size to the various latent sizes that were bucketed prior to training. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions