Add initialization URAE, PiSSA for flux #2001

rockerBOO · 2025-03-24T08:13:28Z

Add more initialization support for other algorithms like URAE (Ultra-Resolution Adaptation with Ease), PiSSA (Principal Singular values and Singular vectors Adaptation).

https://arxiv.org/abs/2503.16322
https://github.com/GraphPKU/PiSSA

--network_args "initialize=pissa"

network_args = [
   "initialize=pissa"
]

fast SVD for PiSSA

network_args = [
    "initialize=pissa_niter_4"
]

Options in this PR: urae, pissa and if no initialize is passed it uses the current default but maybe we want to have a specific value for default for flexibility.

Both options (urae, pissa) will have a slower initialization speed due to the SVD and matrix operations on initialization to align to the original module weights.

URAE was found to help with high resolution training. It is very similar to PiSSA except it is lower values instead.
PiSSA was found to have faster convergence.

PiSSA	PiSSA

URAE

URAE would require a couple more things beyond this initialization but works in this initialization effort.

rockerBOO · 2025-03-25T22:24:38Z

PiSSA changes the base model weight as a residual weight instead of it's original weight. The suggestion is to convert the delta of the original up/down decomposition before training, and apply the SVD of that to get a LoRA like result. They also suggest R*2 in the paper so if you set it as rank 8 it will result in a rank 16 LoRA.

Added progress to the initialization so you can see it is doing something but the progress bars are not completely accurate yet.

Added tests for lora_util and also for lora_flux. I was running into issues and would have to start up a whole training to test things so adding these tests makes the process of testing faster and also we can make sure these are accurate for any regressions in the future.

Current issues:

Check if the SVD calculations are happening on the correct device. With block swapping the original module might not be on the ideal computation device. And in initialization the LoRA module isn't on a device but the CPU.
Check if we are not losing the grad information when initializing when appropriate.
Check for optimizations to performance of initialization.

Otherwise PiSSA should be working as expected currently and need to then update URAE to be similar.

…hts to CPU

rockerBOO · 2025-04-10T05:07:18Z

PiSSA is resulting in fp16 weights so check the dtype when moving things around to keep the target dtype.

Change lora_util to network_utils to match terms.

This reverts commit 67f8e17, reversing changes made to 9d7e2dd.

- Add pythonpath = . to pytest to get the current directory - Fix device of LoRA after PiSSA initialization to return to proper device

rockerBOO · 2025-04-11T02:09:41Z

Current issues:

Initialization progress is not accurate, causing incorrect assessment of time remaining or actual progress.
URAE needs to be finished and isn't completed.

PiSSA has been working for me and another so far.

PiSSA does take some time to process, and it is only using a certain amount of the GPU to process, so it could be possibly batched to better utilize the GPU. Not sure how much more performance that would bring. Looking at 7-15 minutes to process (initialize or save) depending on the GPU currently.

muooon · 2025-05-05T07:30:05Z

Hello rockerBOO. Nice to meet you.
I have read the URAE and PiSSA papers you introduced many times and was amazed at how wonderful the update methods are! It is amazing that you can learn with fewer resources.
Since the title says "for FLUX," it seems like it can be run on FLUX, but I would like to use these two functions on SDXL as well. Will they be available on SDXL in the future?

rockerBOO さん、こんにちは。はじめまして。
URAE、PiSSA、ご紹介いただいた論文を何度も読み、こんなに素晴らしい更新方法があるのか！と感嘆しました。省資源で学習を行える、これは凄いことですね。
for FLUX、と題名にありますので FLUX で実行できるようですが、SDXL でもこの２つの機能を使いたいです。今後 SDXL でも使えるようになるのでしょうか？

rockerBOO · 2025-05-05T07:59:25Z

Flux is a starting point. It would work for any of the networks. I will try to make this work for SDXL sooner.

Flux は出発点です。どのネットワークでも動作します。近いうちに SDXL でも動作するようにしたいと思います。

muooon · 2025-05-05T08:30:29Z

Flux is a starting point. It would work for any of the networks. I will try to make this work for SDXL sooner.

How wonderful!!!! If there are any tests, I will participate and give feedback.
Thank you for your prompt and kind reply. I appreciate it.
なんと素晴らしい!!!! テスト等あれば参加しフィードバックします。
早々に快くご返信くださりありがとうございました。感謝します。

FurkanGozukara · 2025-05-05T09:01:25Z

Hey @rockerBOO ty so much for your efforts. if i use this pull request, i can fine tune / lora a flux model with low number of images with URAE ? i mean lets i train a concept with 200 images and 2048x2048

is URAE for this ? thank you

if this is the case i can test this

rockerBOO · 2025-05-05T20:16:31Z

@FurkanGozukara The URAE implementation here would just be for the initialization part. It isn't as completed as PiSSA is at this point though. URAE also has proprortional attention and NTK scaling for the timestep embeddings which is a separate thing. So would hold off of the URAE testing at this time. I think my wavelet loss PR and implementing URAE parameters will end up producing the best result for higher resolution type images though.

This reverts commit 5546749.

muooon · 2025-06-01T22:11:08Z

rockerBOO さん、
　質問させてください。この PiSSA ですが、２つのモデル(ローカルモデル) の差分LoRA抽出時に PiSSA型LoRA で作れませんか？また既存LoRA をモデル差分とみなし PiSSA型への変換もできませんか？
　Let me ask a question. Regarding PiSSA, can you create a PiSSA-type LoRA when extracting the difference between two models (local models)? Also, can you convert an existing LoRA to a PiSSA type by treating it as a model difference?

`

PiSSA型への変換計算

def process_layer(name, delta):
    original_name = name if name in reference_keys else f"model.{name}"  

    if delta.ndim != 2:
        print(f"スキップせず元データコピー: {original_name} (次元が {delta.ndim} のためPiSSA変換不可)")
        lora_weights[original_name] = delta.clone()
        return

    if device == "cuda":
        delta = delta.to(device).to(torch.float16)

    try:
        # **特異値分解（SVD）計算**
        U, S, Vh = torch.linalg.svd(delta.to(torch.float32))
    except Exception as e:
        print(f"⚠️ SVD処理でエラー発生: {e}")
        return

    spectral_energy = S / S.sum()
    essential_values = (spectral_energy > 0.01).sum().item()
    adaptive_rank = min(rank, essential_values, U.shape[1], Vh.shape[0])

    # **PiSSA変換計算**
    optimized_delta = U[:, :adaptive_rank] @ torch.diag_embed(S[:adaptive_rank]) @ Vh[:adaptive_rank, :]

    optimized_delta = optimized_delta.to(torch.float16).cpu()
    torch.cuda.empty_cache()

    lora_weights[original_name] = optimized_delta * alpha  

# **並列処理計算**
with ThreadPoolExecutor(max_workers=num_threads) as executor:
    executor.map(lambda item: process_layer(item[0], item[1]), diff_model.items())

`
　こんな感じで Kohya-SD-Script で使えたらいいなと思ってます。
　もうひとつ質問です、PiSSA ですが SDXL でも使用可能になっているのか教えてください。
　(いろいろな機能を付加してくださり感謝です、ありがとうございます)
　I would like to be able to use it like this with Kohya-SD-Script.
　One more question: Can I use PiSSA with SDXL?
　Thank you for adding various functions.

rockerBOO · 2025-06-02T22:15:40Z

Maybe wouldn't use PiSSA even though it's similar with SVD. But using the spectral energy to merge 2 LoRA's using SVD might be an interesting approach.

SVDに似ているにもかかわらず、Pissaは使用しないかもしれません。しかし、スペクトルエネルギーを使用して2 LORAを使用してSVDを使用することは興味深いアプローチかもしれません。

muooon · 2025-06-03T01:14:18Z

I don't know how effective PiSSA is, but if it is difficult to invade the original model as stated in the paper, I would like to actively use it. When I actually wrote the py code and tried model merging with PiSSA applied, it seemed like the noise was reduced. LoRA may also be a merger that inherits the characteristics of each.
Anyway, rockerBOO, thank you for all the corrections you've made. First of all, Kohya's project itself is wonderful, and I'm also grateful to rockerBOO for supporting it. I also have respect and gratitude for the other people who have supported it. I'd like to say thank you to everyone.
　
PiSSA がどの程度の効果を発揮するか実感はないのですが、論文にあるように元モデルを侵襲しづらいのであれば積極的に活用したい、と思っています。実際にpyコードを組んでPiSSA適用でモデルマージを試してみるとノイズ軽減されたような感触もありました。LoRAもそれぞれの特徴を引き継ぐマージになるかもです。
ともかく rockerBOO さん、いろいろな修正等を行ってくださりありがとうございます。まず Kohya さんのプロジェクト自体がとても素晴らしい、支援する rockerBOO さんにも感謝しています。他にも支援なさっている方々に敬意と感謝を持っています。どなたにもありがとうと言いたいです。

rockerBOO · 2025-06-03T22:49:47Z

Thank you @muooon

I have improved a few things with this now. Now properly supports "fast SVD" which is doing an approximation of the SVD in low rank. Can do 80 modules in 12 seconds now for me compared to 3 minutes. Converting back to a LoRA at the end will still be using full SVD currently. Might be possible to do it faster.

initialize=pissa_niter_4 will do PiSSA for 4 iterations (recommended)

network_args = [
    "initialize=pissa_niter_4"
]

Will be testing it a bit more but then it will be ready. Want to get this completed for Flux before updating other LoRA's to support these new methods.

muooon · 2025-06-16T12:28:34Z

I am always grateful for your courteous service. It is very good news that we can proceed with the process immediately with the high-speed SVD. Thank you very much.
Well, I think it's a little related to PiSSA, but I tried to apply a new method called ABBA to LoRA combinations with a rough code, and the performance was good. PiSSA, Inf, ABBA, all of these can be said to be examples of using SVD, but they are amazing because they enhance the effect of LoRA while ensuring diversity.
丁寧なご対応にいつも感謝です。高速SVDですぐに処理を進められるというのはとても良い知らせです。ありがとうございました。
さて、PiSSA に少し関係あると思いますが ABBA と呼ばれる新しい手法を大雑把なコードにし LoRA同士の結合に応用してみたら性能いい感じです。PiSSA, Inf, ABBA, どれもSVD活用例と言えると思いますがこれらは LoRA の効果を高めつつ多様性も確保してすごいですね。

- Introduced create_mock_nitter_environment() to simulate different nitter levels - Added test_pissa_with_nitter_level_variations() to test initialization - Covers low, medium, and high nitter configuration levels - Tests precision, rank, scale, and memory variations

- Introduced test_urae_initialization_with_level_variations() - Covered different precision levels - Tested various rank and configuration scenarios - Expanded test coverage for URAE initialization

- Added test_save_initialization_conversion to verify weight conversion - Checks conversion of PISSA and URAE initialization methods - Ensures weight keys are standardized during saving - Verifies alpha preservation during conversion

rockerBOO · 2025-06-16T17:13:16Z

@muooon Refering to https://arxiv.org/abs/2505.14238v1 paper? This might be a little more involved and is different than "LoRA" architecture so would require a separate implementation. I have been working to consider how we can more easily add different architectures. Most similar is how LyCORIS has implemented a modular system for different architectures but doesn't have the same customization that we have here.

I have completed the automated tests for PiSSA and URAE now so just need to do some manual testing to make sure it's all working and then this should be good to merge in. Once completed we can apply it to other LoRA implementations. This process is to make sure our implementation will be accepted as it changes some of how the initialization is done as well as the de-compilation back to the LoRA format.

https://arxiv.org/abs/2505.14238v1 の論文を参照されていますか？これは少し複雑で、「LoRA」アーキテクチャとは異なるため、別途実装が必要になる可能性があります。異なるアーキテクチャをより簡単に追加する方法を検討しています。最も類似しているのは、LyCORIS がさまざまなアーキテクチャ向けにモジュラーシステムを実装している点ですが、ここで採用しているようなカスタマイズは行っていません。

PiSSA と URAE の自動テストは完了しましたので、あとは手動テストを行ってすべて正常に動作することを確認し、その後はマージできるはずです。完了したら、他の LoRA 実装に適用できます。このプロセスは、初期化方法や LoRA 形式への逆コンパイル方法の一部を変更するため、実装が確実に受け入れられるかどうかを確認するためのものです。

muooon · 2025-06-17T04:25:56Z

Thank you for your explanation. ABBA is complicated with A, B, parameters, etc., so I understand that many additional updates are necessary. I'm sorry for talking about it without thinking deeply. I finally understand that this request is an important confirmation for the future. I don't mean to bother you, and I'm sorry. Thank you for your polite replies over and over again. I'm looking forward to URAE and PiSSA. Thank you again.

rockerBOO さん、解説を頂きありがとうございます。ABBAは A、B、パラメータ等で複雑化していますよね、そのためいくつもの追加更新が必要とのことですね、深く考えずに話題にしごめんなさい。このリクエストの取り組みが今後のための重要な確認を試していることも今ようやく理解しました。迷惑をかけるつもりはなく、こちらも申し訳なく思います。幾度にもわたり丁寧なご返信等を頂きありがとうございました。URAEとPiSSAを楽しみにしています。重ね重ねですがありがとうございました。

rockerBOO added 2 commits March 24, 2025 04:08

Add initialization URAE, PiSSA for flux

85928dd

Remove rank stabilization

58bdf85

rockerBOO mentioned this pull request Mar 24, 2025

Add URAE Ultra-Resolution Adaptation with Ease support #2002

Draft

rockerBOO added 3 commits March 24, 2025 16:31

Add PiSSA decomposition before saving

3356314

Detach and clone original LoRA weights before training

0bad5ae

Update initialization, add lora_util, add tests

0ad3b3c

rockerBOO added 8 commits March 25, 2025 18:31

Add test_util

04d2884

Make relative for tests module

b0ea967

Add module init for tests

adb77d4

Move test util to library

c5c07a4

Autocast shouldn't be on dtype float32

54d4de0

Make sure on better device (cuda if available) for initialization

da47d17

Properly move original model weights to device, offload org lora weig…

87fe284

…hts to CPU

Merge branch 'sd3' into flux-lora-init

5f92744

rockerBOO added 5 commits April 10, 2025 20:59

Fix LoRA dtype when saving PiSSA

9d7e2dd

Change lora_util to network_utils to match terms.

Merge branch 'sd3' into flux-lora-init

67f8e17

Revert "Merge branch 'sd3' into flux-lora-init"

7dd0020

This reverts commit 67f8e17, reversing changes made to 9d7e2dd.

Fix GGPO variables. Fix no _org lora values.

adb0e54

- Add pythonpath = . to pytest to get the current directory - Fix device of LoRA after PiSSA initialization to return to proper device

Fix typo

5391c4f

rockerBOO added 3 commits May 7, 2025 23:23

Merge branch 'sd3' into flux-lora-init

d0eb3b5

Add lowrank SVD for PiSSA. Implement URAE conversion

ef83712

Fix default initialization

5546749

rockerBOO added 4 commits May 7, 2025 23:33

Revert "Fix default initialization"

19b6764

This reverts commit 5546749.

Fix default initialization

e0f1ae0

Update URAE initialization, conversion. Add tests

89b6f8b

WIP: Updated PiSSA and URAE initialization

faab3f0

rockerBOO added 7 commits June 3, 2025 17:00

Fix tests for PiSSA, fix lowrank SVD, Remove ICPA

5e35ea5

Fix lowrank PISSA. Add more tests

6c6f317

Merge branch 'sd3' into flux-lora-init

03f1445

Undo train_network changes

d001133

Undo num_timesteps change

e914257

Remove IncrementalPCA

2e2b07f

Remove lowrank for URAE

ed46280

rockerBOO added 4 commits June 16, 2025 12:53

Add comprehensive tests for URAE initialization

925a4a4

- Introduced test_urae_initialization_with_level_variations() - Covered different precision levels - Tested various rank and configuration scenarios - Expanded test coverage for URAE initialization

Merge branch 'sd3' into flux-lora-init

cf44ab7

Uh oh!

Add initialization URAE, PiSSA for flux #2001

Are you sure you want to change the base?

Add initialization URAE, PiSSA for flux #2001

Uh oh!

Conversation

rockerBOO commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

URAE

Uh oh!

rockerBOO commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rockerBOO commented Apr 10, 2025

Uh oh!

rockerBOO commented Apr 11, 2025

Uh oh!

muooon commented May 5, 2025

Uh oh!

rockerBOO commented May 5, 2025

Uh oh!

muooon commented May 5, 2025

Uh oh!

FurkanGozukara commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rockerBOO commented May 5, 2025

Uh oh!

muooon commented Jun 1, 2025

PiSSA型への変換計算

Uh oh!

rockerBOO commented Jun 2, 2025

Uh oh!

muooon commented Jun 3, 2025

Uh oh!

rockerBOO commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muooon commented Jun 16, 2025

Uh oh!

rockerBOO commented Jun 16, 2025

Uh oh!

muooon commented Jun 17, 2025

Uh oh!

Uh oh!

rockerBOO commented Mar 24, 2025 •

edited

Loading

rockerBOO commented Mar 25, 2025 •

edited

Loading

FurkanGozukara commented May 5, 2025 •

edited

Loading

rockerBOO commented Jun 3, 2025 •

edited

Loading