-
I've used for training a SDXL-Lora (1024,1024) the following helpful options for an RTX4060 with only 8GB VRAM: Now with the V25.X-Gui I enabled "Cache text encoder Outputs" No more sharing Memory, speedup from 15-25s/step to incredible 1.5s per step. But it seems that resuming a previously trained project starts from beginning, although the logs throw out that all steps are loaded from the given directory. Any clues on this? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I've had a look at the kohya-ss sd-scripts git, the issue described there I've also noticed previously. Maybe in V25 here things got worse as the issue there hasn't been resolved yet? |
Beta Was this translation helpful? Give feedback.
Digged deeper, resume training does work in V25.
Proof:
The resumed training (blue in graphs) does not do a LR-warmup, avg-loss is continued on previous training (orange).
Data-set included 68 images, max step set to 6800 on beginning.
Resumed training-cfg was identical to initial, except the new max step 20400 (6800x3) and adding the last state-directory.
Coclusion:
Bug in kohya_ss scripts, as stated in above link.
Resuming requires a max step or max epoch bigger then already achieved in "last state".
Training will stop regularly at step 13600 when new "max steps/epochs" minus already trained steps/epochs of "last state" (20400-6800) is reached, so only the difference is trained.
The i…