-
Notifications
You must be signed in to change notification settings - Fork 6k
CogView4 (supports different length c and uc) #10649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
88 commits
Select commit
Hold shift + click to select a range
2640bcf
init
zRzRzRzRzRzRzR eba11fa
Merge branch 'huggingface:main' into cogview4
zRzRzRzRzRzRzR 6163679
encode with glm
zRzRzRzRzRzRzR 6090ea7
draft schedule
zRzRzRzRzRzRzR c7d1227
feat(scheduler): Add CogView scheduler implementation
OleehyO e9f6626
Merge remote-tracking branch 'origin/cogview4' into cogview4
OleehyO 549b357
Merge branch 'huggingface:main' into cogview4
zRzRzRzRzRzRzR 004d002
Merge branch 'huggingface:main' into cogview4
zRzRzRzRzRzRzR f4457fb
feat(embeddings): add CogView 2D rotary positional embedding
OleehyO 5f8d33b
Merge remote-tracking branch 'origin/cogview4' into cogview4
OleehyO 9a93218
1
zRzRzRzRzRzRzR ca000dd
Update pipeline_cogview4.py
zRzRzRzRzRzRzR 7ab4a3f
fix the timestep init and sigma
zRzRzRzRzRzRzR 56ceaa6
update latent
zRzRzRzRzRzRzR a7179a2
draft patch(not work)
zRzRzRzRzRzRzR c9ddf50
Merge branch 'cogview4'
zRzRzRzRzRzRzR 2f30cc1
Merge pull request #2 from zRzRzRzRzRzRzR/main
zRzRzRzRzRzRzR e6b8907
fix
zRzRzRzRzRzRzR 0ab7260
[WIP][cogview4]: implement initial CogView4 pipeline
OleehyO f608f82
[WIP][cogview4][refactor]: Split condition/uncondition forward pass i…
OleehyO b86bfd4
use with -2 hidden state
zRzRzRzRzRzRzR c4d1e69
remove text_projector
zRzRzRzRzRzRzR 7916140
1
zRzRzRzRzRzRzR f8945ce
[WIP] Add tensor-reload to align input from transformer block
OleehyO bf7f322
[WIP] for older glm
zRzRzRzRzRzRzR dd6568b
use with cogview4 transformers forward twice of u and uc
zRzRzRzRzRzRzR 6f5407e
Merge branch 'huggingface:main' into cogview4
zRzRzRzRzRzRzR 9e5b991
Update convert_cogview4_to_diffusers.py
zRzRzRzRzRzRzR 36b1682
remove this
zRzRzRzRzRzRzR 804f5cc
Merge pull request #3 from zRzRzRzRzRzRzR/main
zRzRzRzRzRzRzR 16c2397
use main example
zRzRzRzRzRzRzR 601696d
change back
zRzRzRzRzRzRzR 84115dc
reset
zRzRzRzRzRzRzR 95a103f
setback
zRzRzRzRzRzRzR d932f67
back
zRzRzRzRzRzRzR b04f15d
back 4
zRzRzRzRzRzRzR 5d33f3f
Fix qkv conversion logic for CogView4 to Diffusers format
zRzRzRzRzRzRzR b889b37
back5
zRzRzRzRzRzRzR e239c3c
revert to sat to cogview4 version
zRzRzRzRzRzRzR 310da29
update a new convert from megatron
zRzRzRzRzRzRzR 3bd6d30
[WIP][cogview4]: implement CogView4 attention processor
OleehyO f826aec
[cogview4] implement CogView4 transformer block
OleehyO 8d8ed8b
Merge branch 'huggingface:main' into cogview4
zRzRzRzRzRzRzR bf1fdc8
with new attn
zRzRzRzRzRzRzR 6a3a07f
[bugfix] fix dimension mismatch in CogView4 attention
OleehyO de274f3
[cogview4][WIP]: update final normalization in CogView4 transformer
OleehyO e94999e
Merge remote-tracking branch 'origin/cogview4' into cogview4
OleehyO e238284
Merge branch 'huggingface:main' into cogview4
zRzRzRzRzRzRzR a9b1e16
Merge branch 'huggingface:main' into cogview4
zRzRzRzRzRzRzR 46277b2
1
zRzRzRzRzRzRzR ebbaa5b
put back
zRzRzRzRzRzRzR f1ccdd2
Update transformer_cogview4.py
zRzRzRzRzRzRzR 030a467
change time_shift
zRzRzRzRzRzRzR ad40575
Update pipeline_cogview4.py
zRzRzRzRzRzRzR 81d39ee
change timesteps
zRzRzRzRzRzRzR 45f9e88
fix
zRzRzRzRzRzRzR 1dbeaa8
change text_encoder_id
zRzRzRzRzRzRzR f209600
[cogview4][rope] align RoPE implementation with Megatron
OleehyO 992f5a3
[cogview4][bugfix] apply silu activation to time embeddings in CogView4
OleehyO 03a1c3b
[cogview4][chore] clean up pipeline code
OleehyO dd34794
Merge remote-tracking branch 'origin/cogview4' into cogview4
OleehyO 3dab073
[cogview4][scheduler] Implement CogView4 scheduler and pipeline
OleehyO 63982d6
now It work
zRzRzRzRzRzRzR 90a5706
Merge branch 'huggingface:main' into cogview4
zRzRzRzRzRzRzR d4748e0
add timestep
zRzRzRzRzRzRzR 95f851d
batch
zRzRzRzRzRzRzR cb56282
change convert scipt
zRzRzRzRzRzRzR fedf325
refactor pt. 1; make style
a-r-r-o-w 90d29c7
Merge branch 'huggingface:main' into cogview4
zRzRzRzRzRzRzR 4c01c9d
refactor pt. 2
a-r-r-o-w c1b8004
refactor pt. 3
a-r-r-o-w 9d55d0a
add tests
a-r-r-o-w 5e6de42
make fix-copies
a-r-r-o-w 30dd0ad
Merge branch 'main' into cogview4
a-r-r-o-w 2046cf2
update toctree.yml
a-r-r-o-w 39e1198
use flow match scheduler instead of custom
a-r-r-o-w b566a9f
Merge branch 'main' into cogview4
a-r-r-o-w b4c9fde
remove scheduling_cogview.py
a-r-r-o-w a137e17
add tiktoken to test dependencies
a-r-r-o-w da420fb
Update src/diffusers/models/embeddings.py
a-r-r-o-w 4003b9c
apply suggestions from review
a-r-r-o-w 35c0ec6
use diffusers apply_rotary_emb
a-r-r-o-w d328c5e
update flow match scheduler to accept timesteps
a-r-r-o-w d637d3a
Merge branch 'main' into cogview4
a-r-r-o-w 4c37ef0
fix comment
a-r-r-o-w 90c240b
apply review sugestions
a-r-r-o-w 5c11298
Merge branch 'main' into cogview4
a-r-r-o-w 2f12b7a
Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
a-r-r-o-w File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
use flow match scheduler instead of custom
- Loading branch information
commit 39e1198029b8df98cdda202066073734f00d7d6d
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit different from what we usually do. We don't use
self.scheduler.timesteps
returned from the call toretrieve_timesteps
because those timesteps are from after resolution-based timestep shifting is applied. For CogView4, it seems like we need to use the timesteps from before applying shifting, but sigmas from after applying shifting.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's really werid - but if it's on purpose (not a oversight or something), we need support it from
retrieve_timesteps
and also theset_timesteps
method from scheduler to accept both custom timesteps and custom sigmas. Either that or maybe add an option to calculate timesteps based on the sigmas pre-shitingotherwise I don't think it would function correctly for img2img or training, where you do not start from the first timestep and need to search against self.scheduler.timesteps
e.g. this function won't work
diffusers/src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
Line 299 in a0c2299
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have access to the original codebase yet, so it will be hard to check if it's an oversight. It is weird that we have to do it this way, but if we don't do it (that is having sigmas corresponding to timesteps), the final outputs come out with some residual noise.
Also seems like in my latest update I made a mistake doing
timesteps.astype(np.float32)
from some local testing. Basically, we want integer timesteps here first (to round down the float values from linspace), but then need float32 timesteps for our scheduler to not raise an error:diffusers/src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
Lines 366 to 372 in a0c2299
So, it will have to be something like
timesteps.astype(np.int64).astype(np.float32)
to be consistent with the behaviour when we started updating the PR and to not error out in our schedulerThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could return timesteps without shifting, then apply shifting on the fly in scheduler.step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it seems like custom timesteps might be the way to go because this logic here is just really custom (even if we calculate the timesteps without shifting, we also need to do this round up thing first)
basically, you need to:
timesteps
toset_timesteps
diffusers/src/diffusers/schedulers/scheduling_euler_discrete.py
Line 323 in a0c2299
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do some more testing with a fresh mind in the morning to verify if we need differing timesteps vs sigmas here (to recheck if it is a possible oversight or not). I do think I did everything correctly when I tried earlier today, and as a result we might need to do what you mentioned, but wouldn't hurt to delay a little longer and verify again if it might help save us a bunch of changes.
@zRzRzRzRzRzRzR If it would be possible to share just the scheduler implemention related files with us, it would really help us understand if changes are required. No problem if not :) We can wait for the official release from THUDM and update our implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to have to go forward with the update regarding both timesteps and sigmas being provided. Almost anything else I try seems to result in residual noise. I've pushed some changes regarding handling with some additional comments in
.step()
for better understanding. Okay to remove if it's not really required