Skip to content

Lora Hotswap no clear documentation #11423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vahe-toffee opened this issue Apr 26, 2025 · 1 comment
Open

Lora Hotswap no clear documentation #11423

vahe-toffee opened this issue Apr 26, 2025 · 1 comment

Comments

@vahe-toffee
Copy link

vahe-toffee commented Apr 26, 2025

Hello everyone.

Here is the scenario I have.

I have say 10 LoRAs that I would like to load and use depending on the request.

Option one:
using load_lora_weights - reads from the disk and moves to device: expensive operation

Option two:
load all loras and weights of non-used LoRAS with set_adapters method to 0.0. Not practical since the forward pass becomes expensive. Since all LoRAS are still loaded.

Option three:
Find an elegant way of loading LoRAs to CPU and then moving them to GPU as needed. While I was trying to do that, I saw the new parameter of hotswapping in hte load_lora_weights method. And this is what is described in the documentation:

hotswap — (bool, optional) Defaults to False. Whether to substitute an existing (LoRA) adapter with the newly loaded adapter in-place. This means that, instead of loading an additional adapter, this will take the existing adapter weights and replace them with the weights of the new adapter. This can be faster and more memory efficient. However, the main advantage of hotswapping is that when the model is compiled with torch.compile, loading the new adapter does not require recompilation of the model. When using hotswapping, the passed adapter_name should be the name of an already loaded adapter. If the new adapter and the old adapter have different ranks and/or LoRA alphas (i.e. scaling), you need to call an additional method before loading the adapter

could someone help me out here and name the mysterious function to be called?

and optionally would be great if someone could help me with my scenario.

@songh11
Copy link

songh11 commented Apr 27, 2025

You can use pipe.enable_lora_hotswap(target_rank=max_rank) to set the maximum rank. This question might be helpful to you.https://github.com/huggingface/diffusers/issues/11408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants