You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
表现:不能推理垂直视频(除了 width480 x height720 可以跑通,但是效果并不好),其他比例会出错,例如:
--width 768 --height 1360: 不能实现,报错同一内容,RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 85 but got size 48 for tensor number 1 in the list.
--width 768 --height 1080: 不能实现,报错同一内容,RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 67 but got size 48 for tensor number 1 in the list.
--width 768 --height 960: 不能实现,报错同一内容,RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 60 but got size 48 for tensor number 1 in the list.
For the first question, It seems that, for I2V model, the input image condition should not multiply the scale. Therefore, during training, the video latent should multiply the scale, but the image condition shouldn't. I'm not full sure and I will try it.
Uh oh!
There was an error while loading. Please reload this page.
贵模型研究人员:
您好!我在使用CogVideoX-5B-I2V-v1.5模型时遇到了一些问题,通过检索仓库内和相关仓库issue,有一些初步的解决方案,但总结之后,仍对如下内容有一些疑问,望得到解决。
SAT模型和diffusers模型存在差异问题
问题1:这种解决方案是否正确?
其他相关issue
表现:SAT模型和diffusers模型存在差异,diffusers模型第一帧之后颜色变稍微灰一点,模糊一点
原因:1.5版本的I2V diffusers模型官方在训练时没有乘上vae_scaling_factor_image系数
解决方案:需要手动修改源码,位置
diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py
CogVideoX 1.5 diffusers LoRA Fine-tuning问题
问题2:如果上述解决方法正确的情况下,如何进行lora微调训练和推理
方案一
lora微调训练时,原本lora微调训练代码中,手动修改去掉image latent乘上
self.vae_scaling_factor_image
系数相关代码lora微调推理时,用
1.0 * image_latents
方案二
lora微调训练时,保持原本lora微调训练代码不动 image latent的
self.vae_scaling_factor_image
系数lora微调推理时,保持原版
pipeline_cogvideox_image2video.py
不动,用1 / self.vae_scaling_factor_image * image_latents
背景:微调的训练代码中,观察到所有lora微调的代码中都有image latent vae_scaling_factor相乘的的部分,也就是这里并没有忘记要乘系数,所以后面才需要除以这个系数,然后就等于微调的时候系数也是1.0了,(只是官方团队在预训练模型的时候没有乘系数?)
参考代码
CogVideo/finetune/models/cogvideox_i2v/lora_trainer.py
Line 143 in 5ab1e24
CogVideoX 1.5 I2V 垂直视频不能推理问题
问题3:这种解决方案是否正确?
其他相关issue
表现:不能推理垂直视频(除了 width480 x height720 可以跑通,但是效果并不好),其他比例会出错,例如:
--width 768 --height 1360
: 不能实现,报错同一内容,RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 85 but got size 48 for tensor number 1 in the list.
--width 768 --height 1080
: 不能实现,报错同一内容,RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 67 but got size 48 for tensor number 1 in the list.
--width 768 --height 960
: 不能实现,报错同一内容,RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 60 but got size 48 for tensor number 1 in the list.
原因:rope旋转编码嵌入的逻辑假设 sample_width 大于 sample_height ,分别设置为 170 和 96 。
解决方案:需要修改vae模型中的配置,位置
CogVideoX1.5-5B-I2V/transformer/config.json
,如果需要生成垂直视频,设置The text was updated successfully, but these errors were encountered: