-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Add initialization URAE, PiSSA for flux #2001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sd3
Are you sure you want to change the base?
Conversation
PiSSA changes the base model weight as a residual weight instead of it's original weight. The suggestion is to convert the delta of the original up/down decomposition before training, and apply the SVD of that to get a LoRA like result. They also suggest R*2 in the paper so if you set it as rank 8 it will result in a rank 16 LoRA. Added progress to the initialization so you can see it is doing something but the progress bars are not completely accurate yet. Added tests for lora_util and also for lora_flux. I was running into issues and would have to start up a whole training to test things so adding these tests makes the process of testing faster and also we can make sure these are accurate for any regressions in the future. Current issues:
Otherwise PiSSA should be working as expected currently and need to then update URAE to be similar. |
PiSSA is resulting in fp16 weights so check the dtype when moving things around to keep the target dtype. |
Change lora_util to network_utils to match terms.
- Add pythonpath = . to pytest to get the current directory - Fix device of LoRA after PiSSA initialization to return to proper device
Current issues:
PiSSA has been working for me and another so far. PiSSA does take some time to process, and it is only using a certain amount of the GPU to process, so it could be possibly batched to better utilize the GPU. Not sure how much more performance that would bring. Looking at 7-15 minutes to process (initialize or save) depending on the GPU currently. |
Hello rockerBOO. Nice to meet you.
|
Flux is a starting point. It would work for any of the networks. I will try to make this work for SDXL sooner. Flux は出発点です。どのネットワークでも動作します。近いうちに SDXL でも動作するようにしたいと思います。 |
How wonderful!!!! If there are any tests, I will participate and give feedback. |
Hey @rockerBOO ty so much for your efforts. if i use this pull request, i can fine tune / lora a flux model with low number of images with URAE ? i mean lets i train a concept with 200 images and 2048x2048 is URAE for this ? thank you if this is the case i can test this |
@FurkanGozukara The URAE implementation here would just be for the initialization part. It isn't as completed as PiSSA is at this point though. URAE also has proprortional attention and NTK scaling for the timestep embeddings which is a separate thing. So would hold off of the URAE testing at this time. I think my wavelet loss PR and implementing URAE parameters will end up producing the best result for higher resolution type images though. |
rockerBOO さん、 ` PiSSA型への変換計算
` |
Maybe wouldn't use PiSSA even though it's similar with SVD. But using the spectral energy to merge 2 LoRA's using SVD might be an interesting approach. SVDに似ているにもかかわらず、Pissaは使用しないかもしれません。しかし、スペクトルエネルギーを使用して2 LORAを使用してSVDを使用することは興味深いアプローチかもしれません。 |
I don't know how effective PiSSA is, but if it is difficult to invade the original model as stated in the paper, I would like to actively use it. When I actually wrote the py code and tried model merging with PiSSA applied, it seemed like the noise was reduced. LoRA may also be a merger that inherits the characteristics of each. |
Thank you @muooon I have improved a few things with this now. Now properly supports "fast SVD" which is doing an approximation of the SVD in low rank. Can do 80 modules in 12 seconds now for me compared to 3 minutes. Converting back to a LoRA at the end will still be using full SVD currently. Might be possible to do it faster.
Will be testing it a bit more but then it will be ready. Want to get this completed for Flux before updating other LoRA's to support these new methods. |
I am always grateful for your courteous service. It is very good news that we can proceed with the process immediately with the high-speed SVD. Thank you very much. |
- Introduced create_mock_nitter_environment() to simulate different nitter levels - Added test_pissa_with_nitter_level_variations() to test initialization - Covers low, medium, and high nitter configuration levels - Tests precision, rank, scale, and memory variations
- Introduced test_urae_initialization_with_level_variations() - Covered different precision levels - Tested various rank and configuration scenarios - Expanded test coverage for URAE initialization
- Added test_save_initialization_conversion to verify weight conversion - Checks conversion of PISSA and URAE initialization methods - Ensures weight keys are standardized during saving - Verifies alpha preservation during conversion
@muooon Refering to https://arxiv.org/abs/2505.14238v1 paper? This might be a little more involved and is different than "LoRA" architecture so would require a separate implementation. I have been working to consider how we can more easily add different architectures. Most similar is how LyCORIS has implemented a modular system for different architectures but doesn't have the same customization that we have here. I have completed the automated tests for PiSSA and URAE now so just need to do some manual testing to make sure it's all working and then this should be good to merge in. Once completed we can apply it to other LoRA implementations. This process is to make sure our implementation will be accepted as it changes some of how the initialization is done as well as the de-compilation back to the LoRA format. https://arxiv.org/abs/2505.14238v1 の論文を参照されていますか?これは少し複雑で、「LoRA」アーキテクチャとは異なるため、別途実装が必要になる可能性があります。異なるアーキテクチャをより簡単に追加する方法を検討しています。最も類似しているのは、LyCORIS がさまざまなアーキテクチャ向けにモジュラーシステムを実装している点ですが、ここで採用しているようなカスタマイズは行っていません。 PiSSA と URAE の自動テストは完了しましたので、あとは手動テストを行ってすべて正常に動作することを確認し、その後はマージできるはずです。完了したら、他の LoRA 実装に適用できます。このプロセスは、初期化方法や LoRA 形式への逆コンパイル方法の一部を変更するため、実装が確実に受け入れられるかどうかを確認するためのものです。 |
Thank you for your explanation. ABBA is complicated with A, B, parameters, etc., so I understand that many additional updates are necessary. I'm sorry for talking about it without thinking deeply. I finally understand that this request is an important confirmation for the future. I don't mean to bother you, and I'm sorry. Thank you for your polite replies over and over again. I'm looking forward to URAE and PiSSA. Thank you again. rockerBOO さん、解説を頂きありがとうございます。ABBAは A、B、パラメータ等で複雑化していますよね、そのためいくつもの追加更新が必要とのことですね、深く考えずに話題にしごめんなさい。このリクエストの取り組みが今後のための重要な確認を試していることも今ようやく理解しました。迷惑をかけるつもりはなく、こちらも申し訳なく思います。幾度にもわたり丁寧なご返信等を頂きありがとうございました。URAEとPiSSAを楽しみにしています。重ね重ねですがありがとうございました。 |
Add more initialization support for other algorithms like URAE (Ultra-Resolution Adaptation with Ease), PiSSA (Principal Singular values and Singular vectors Adaptation).
https://arxiv.org/abs/2503.16322
https://github.com/GraphPKU/PiSSA
fast SVD for PiSSA
Options in this PR:
urae
,pissa
and if no initialize is passed it uses the current default but maybe we want to have a specific value for default for flexibility.Both options (urae, pissa) will have a slower initialization speed due to the SVD and matrix operations on initialization to align to the original module weights.
URAE was found to help with high resolution training. It is very similar to PiSSA except it is lower values instead.
PiSSA was found to have faster convergence.
URAE
URAE would require a couple more things beyond this initialization but works in this initialization effort.