Replies: 5 comments
-
My observation is that |
Beta Was this translation helpful? Give feedback.
-
use lcm lora sdv1.5 with 4 steps and taesd it be much faster |
Beta Was this translation helpful? Give feedback.
-
pretty sure you meant s/it |
Beta Was this translation helpful? Give feedback.
-
Yeah s/it... Please keep in mind that ComfyUI is using f32 and not any lower quantization with 20 steps and is more than 2X faster. Anyone know what tensorvision-cpu is doing that's so much faster than gglm? |
Beta Was this translation helpful? Give feedback.
-
@RogerDass It's just that PyTorch implements more optimized convolution algorithms that are too complex to implement in ggml. That's why PyTorch is quite heavy; instead of reinventing the wheel, they reuse existing code to avoid unnecessary complications. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
Thanks for making this amazing project!
So i'm running this with realistic vision v1.5 checkpoint. I'm getting ~75 s/it with OpenBLAS enabled.
Any idea how to speed that up significantly?
With ComfyUI on the same machine in CPU mode, I'm getting ~30 s/it and it takes 10 min to generate 512x512 image with 20 steps.
What's causing such a large performance difference?
Do you know if there's a way to get some basic OpenGL 3 acceleration for some of the tensor ops?
Beta Was this translation helpful? Give feedback.
All reactions