Description
Hi there --
I'm not sure if the issue I'm seeing is in these bindings or in the upstream lib, but I'm observing that when using the high-level API on CUBLAS, that after the __del__
wiring does the fancy stepping to invoke free_sd_ctx
when the model gets either garbage collected or explicitly deleted -- not all the VRAM gets released.
After some experimentation, I've noticed that the amount that hangs around is always almost exactly the same amount as gets allocated for the VAE, plus about 100MB. VAE tiling reduced the size of the leak, and doing the VAE phase on the CPU leaves just the 100 or so MB of leftovers.
If I was going to hazard a guess, there's more being allocated in stable-diffusion.cpp's load_from_file()
than is getting freed by the free_sd_ctx() call.
If there's anything I can do to help sleuth this out, please don't hesitate to ask.