Skip to content

Add initialization URAE, PiSSA for flux #2001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 36 commits into
base: sd3
Choose a base branch
from

Conversation

rockerBOO
Copy link
Contributor

@rockerBOO rockerBOO commented Mar 24, 2025

Add more initialization support for other algorithms like URAE (Ultra-Resolution Adaptation with Ease), PiSSA (Principal Singular values and Singular vectors Adaptation).

https://arxiv.org/abs/2503.16322
https://github.com/GraphPKU/PiSSA

--network_args "initialize=pissa"
network_args = [
   "initialize=pissa"
]

fast SVD for PiSSA

network_args = [
    "initialize=pissa_niter_4"
]

Options in this PR: urae, pissa and if no initialize is passed it uses the current default but maybe we want to have a specific value for default for flexibility.

Both options (urae, pissa) will have a slower initialization speed due to the SVD and matrix operations on initialization to align to the original module weights.

URAE was found to help with high resolution training. It is very similar to PiSSA except it is lower values instead.
PiSSA was found to have faster convergence.

PiSSA PiSSA
loss_landscape llama3

URAE

Screenshot 2025-03-24 at 04-13-42 Ultra-Resolution Adaptation with Ease - 2503 16322v1 pdf

URAE would require a couple more things beyond this initialization but works in this initialization effort.

@rockerBOO
Copy link
Contributor Author

rockerBOO commented Mar 25, 2025

PiSSA changes the base model weight as a residual weight instead of it's original weight. The suggestion is to convert the delta of the original up/down decomposition before training, and apply the SVD of that to get a LoRA like result. They also suggest R*2 in the paper so if you set it as rank 8 it will result in a rank 16 LoRA.

Added progress to the initialization so you can see it is doing something but the progress bars are not completely accurate yet.

Added tests for lora_util and also for lora_flux. I was running into issues and would have to start up a whole training to test things so adding these tests makes the process of testing faster and also we can make sure these are accurate for any regressions in the future.

Current issues:

  • Check if the SVD calculations are happening on the correct device. With block swapping the original module might not be on the ideal computation device. And in initialization the LoRA module isn't on a device but the CPU.
  • Check if we are not losing the grad information when initializing when appropriate.
  • Check for optimizations to performance of initialization.

Otherwise PiSSA should be working as expected currently and need to then update URAE to be similar.

@rockerBOO
Copy link
Contributor Author

PiSSA is resulting in fp16 weights so check the dtype when moving things around to keep the target dtype.

Change lora_util to network_utils to match terms.
This reverts commit 67f8e17, reversing
changes made to 9d7e2dd.
- Add pythonpath = . to pytest to get the current directory
- Fix device of LoRA after PiSSA initialization to return to proper
  device
@rockerBOO
Copy link
Contributor Author

Current issues:

  • Initialization progress is not accurate, causing incorrect assessment of time remaining or actual progress.
  • URAE needs to be finished and isn't completed.

PiSSA has been working for me and another so far.

PiSSA does take some time to process, and it is only using a certain amount of the GPU to process, so it could be possibly batched to better utilize the GPU. Not sure how much more performance that would bring. Looking at 7-15 minutes to process (initialize or save) depending on the GPU currently.

@muooon
Copy link

muooon commented May 5, 2025

Hello rockerBOO. Nice to meet you.
I have read the URAE and PiSSA papers you introduced many times and was amazed at how wonderful the update methods are! It is amazing that you can learn with fewer resources.
Since the title says "for FLUX," it seems like it can be run on FLUX, but I would like to use these two functions on SDXL as well. Will they be available on SDXL in the future?

rockerBOO さん、こんにちは。はじめまして。
URAE、PiSSA、ご紹介いただいた論文を何度も読み、こんなに素晴らしい更新方法があるのか!と感嘆しました。省資源で学習を行える、これは凄いことですね。
for FLUX、と題名にありますので FLUX で実行できるようですが、SDXL でもこの2つの機能を使いたいです。今後 SDXL でも使えるようになるのでしょうか?

@rockerBOO
Copy link
Contributor Author

Flux is a starting point. It would work for any of the networks. I will try to make this work for SDXL sooner.


Flux は出発点です。どのネットワークでも動作します。近いうちに SDXL でも動作するようにしたいと思います。

@muooon
Copy link

muooon commented May 5, 2025

Flux is a starting point. It would work for any of the networks. I will try to make this work for SDXL sooner.


How wonderful!!!! If there are any tests, I will participate and give feedback.
Thank you for your prompt and kind reply. I appreciate it.
なんと素晴らしい!!!! テスト等あれば参加しフィードバックします。
早々に快くご返信くださりありがとうございました。感謝します。

@FurkanGozukara
Copy link

FurkanGozukara commented May 5, 2025

Hey @rockerBOO ty so much for your efforts. if i use this pull request, i can fine tune / lora a flux model with low number of images with URAE ? i mean lets i train a concept with 200 images and 2048x2048

is URAE for this ? thank you

if this is the case i can test this

@rockerBOO
Copy link
Contributor Author

@FurkanGozukara The URAE implementation here would just be for the initialization part. It isn't as completed as PiSSA is at this point though. URAE also has proprortional attention and NTK scaling for the timestep embeddings which is a separate thing. So would hold off of the URAE testing at this time. I think my wavelet loss PR and implementing URAE parameters will end up producing the best result for higher resolution type images though.

@muooon
Copy link

muooon commented Jun 1, 2025

rockerBOO さん、
 質問させてください。この PiSSA ですが、2つのモデル(ローカルモデル) の差分LoRA抽出時に PiSSA型LoRA で作れませんか? また既存LoRA をモデル差分とみなし PiSSA型への変換もできませんか?
 Let me ask a question. Regarding PiSSA, can you create a PiSSA-type LoRA when extracting the difference between two models (local models)? Also, can you convert an existing LoRA to a PiSSA type by treating it as a model difference?

`

PiSSA型への変換計算

def process_layer(name, delta):
    original_name = name if name in reference_keys else f"model.{name}"  

    if delta.ndim != 2:
        print(f"スキップせず元データコピー: {original_name} (次元が {delta.ndim} のためPiSSA変換不可)")
        lora_weights[original_name] = delta.clone()
        return

    if device == "cuda":
        delta = delta.to(device).to(torch.float16)

    try:
        # **特異値分解(SVD)計算**
        U, S, Vh = torch.linalg.svd(delta.to(torch.float32))
    except Exception as e:
        print(f"⚠️ SVD処理でエラー発生: {e}")
        return

    spectral_energy = S / S.sum()
    essential_values = (spectral_energy > 0.01).sum().item()
    adaptive_rank = min(rank, essential_values, U.shape[1], Vh.shape[0])

    # **PiSSA変換計算**
    optimized_delta = U[:, :adaptive_rank] @ torch.diag_embed(S[:adaptive_rank]) @ Vh[:adaptive_rank, :]

    optimized_delta = optimized_delta.to(torch.float16).cpu()
    torch.cuda.empty_cache()

    lora_weights[original_name] = optimized_delta * alpha  

# **並列処理計算**
with ThreadPoolExecutor(max_workers=num_threads) as executor:
    executor.map(lambda item: process_layer(item[0], item[1]), diff_model.items())

`
 こんな感じで Kohya-SD-Script で使えたらいいなと思ってます。
 もうひとつ質問です、PiSSA ですが SDXL でも使用可能になっているのか教えてください。
 (いろいろな機能を付加してくださり感謝です、ありがとうございます)
 I would like to be able to use it like this with Kohya-SD-Script.
 One more question: Can I use PiSSA with SDXL?
 Thank you for adding various functions.

@rockerBOO
Copy link
Contributor Author

Maybe wouldn't use PiSSA even though it's similar with SVD. But using the spectral energy to merge 2 LoRA's using SVD might be an interesting approach.

SVDに似ているにもかかわらず、Pissaは使用しないかもしれません。しかし、スペクトルエネルギーを使用して2 LORAを使用してSVDを使用することは興味深いアプローチかもしれません。

@muooon
Copy link

muooon commented Jun 3, 2025

I don't know how effective PiSSA is, but if it is difficult to invade the original model as stated in the paper, I would like to actively use it. When I actually wrote the py code and tried model merging with PiSSA applied, it seemed like the noise was reduced. LoRA may also be a merger that inherits the characteristics of each.
Anyway, rockerBOO, thank you for all the corrections you've made. First of all, Kohya's project itself is wonderful, and I'm also grateful to rockerBOO for supporting it. I also have respect and gratitude for the other people who have supported it. I'd like to say thank you to everyone.
 
PiSSA がどの程度の効果を発揮するか実感はないのですが、論文にあるように元モデルを侵襲しづらいのであれば積極的に活用したい、と思っています。実際にpyコードを組んでPiSSA適用でモデルマージを試してみるとノイズ軽減されたような感触もありました。LoRAもそれぞれの特徴を引き継ぐマージになるかもです。
ともかく rockerBOO さん、いろいろな修正等を行ってくださりありがとうございます。まず Kohya さんのプロジェクト自体がとても素晴らしい、支援する rockerBOO さんにも感謝しています。他にも支援なさっている方々に敬意と感謝を持っています。どなたにもありがとうと言いたいです。

@rockerBOO
Copy link
Contributor Author

rockerBOO commented Jun 3, 2025

Thank you @muooon

I have improved a few things with this now. Now properly supports "fast SVD" which is doing an approximation of the SVD in low rank. Can do 80 modules in 12 seconds now for me compared to 3 minutes. Converting back to a LoRA at the end will still be using full SVD currently. Might be possible to do it faster.

initialize=pissa_niter_4 will do PiSSA for 4 iterations (recommended)

network_args = [
    "initialize=pissa_niter_4"
]

Will be testing it a bit more but then it will be ready. Want to get this completed for Flux before updating other LoRA's to support these new methods.

@muooon
Copy link

muooon commented Jun 16, 2025

I am always grateful for your courteous service. It is very good news that we can proceed with the process immediately with the high-speed SVD. Thank you very much.
Well, I think it's a little related to PiSSA, but I tried to apply a new method called ABBA to LoRA combinations with a rough code, and the performance was good. PiSSA, Inf, ABBA, all of these can be said to be examples of using SVD, but they are amazing because they enhance the effect of LoRA while ensuring diversity.
丁寧なご対応にいつも感謝です。高速SVDですぐに処理を進められるというのはとても良い知らせです。ありがとうございました。
さて、PiSSA に少し関係あると思いますが ABBA と呼ばれる新しい手法を大雑把なコードにし LoRA同士の結合に応用してみたら性能いい感じです。PiSSA, Inf, ABBA, どれもSVD活用例と言えると思いますがこれらは LoRA の効果を高めつつ多様性も確保してすごいですね。

- Introduced create_mock_nitter_environment() to simulate different nitter levels
- Added test_pissa_with_nitter_level_variations() to test initialization
- Covers low, medium, and high nitter configuration levels
- Tests precision, rank, scale, and memory variations
- Introduced test_urae_initialization_with_level_variations()
- Covered different precision levels
- Tested various rank and configuration scenarios
- Expanded test coverage for URAE initialization
- Added test_save_initialization_conversion to verify weight conversion
- Checks conversion of PISSA and URAE initialization methods
- Ensures weight keys are standardized during saving
- Verifies alpha preservation during conversion
@rockerBOO
Copy link
Contributor Author

@muooon Refering to https://arxiv.org/abs/2505.14238v1 paper? This might be a little more involved and is different than "LoRA" architecture so would require a separate implementation. I have been working to consider how we can more easily add different architectures. Most similar is how LyCORIS has implemented a modular system for different architectures but doesn't have the same customization that we have here.

I have completed the automated tests for PiSSA and URAE now so just need to do some manual testing to make sure it's all working and then this should be good to merge in. Once completed we can apply it to other LoRA implementations. This process is to make sure our implementation will be accepted as it changes some of how the initialization is done as well as the de-compilation back to the LoRA format.

https://arxiv.org/abs/2505.14238v1 の論文を参照されていますか?これは少し複雑で、「LoRA」アーキテクチャとは異なるため、別途実装が必要になる可能性があります。異なるアーキテクチャをより簡単に追加する方法を検討しています。最も類似しているのは、LyCORIS がさまざまなアーキテクチャ向けにモジュラーシステムを実装している点ですが、ここで採用しているようなカスタマイズは行っていません。

PiSSA と URAE の自動テストは完了しましたので、あとは手動テストを行ってすべて正常に動作することを確認し、その後はマージできるはずです。完了したら、他の LoRA 実装に適用できます。このプロセスは、初期化方法や LoRA 形式への逆コンパイル方法の一部を変更するため、実装が確実に受け入れられるかどうかを確認するためのものです。

@muooon
Copy link

muooon commented Jun 17, 2025

Thank you for your explanation. ABBA is complicated with A, B, parameters, etc., so I understand that many additional updates are necessary. I'm sorry for talking about it without thinking deeply. I finally understand that this request is an important confirmation for the future. I don't mean to bother you, and I'm sorry. Thank you for your polite replies over and over again. I'm looking forward to URAE and PiSSA. Thank you again.

rockerBOO さん、解説を頂きありがとうございます。ABBAは A、B、パラメータ等で複雑化していますよね、そのためいくつもの追加更新が必要とのことですね、深く考えずに話題にしごめんなさい。このリクエストの取り組みが今後のための重要な確認を試していることも今ようやく理解しました。迷惑をかけるつもりはなく、こちらも申し訳なく思います。幾度にもわたり丁寧なご返信等を頂きありがとうございました。URAEとPiSSAを楽しみにしています。重ね重ねですがありがとうございました。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants