You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have say 10 LoRAs that I would like to load and use depending on the request.
Option one:
using load_lora_weights - reads from the disk and moves to device: expensive operation
Option two:
load all loras and weights of non-used LoRAS with set_adapters method to 0.0. Not practical since the forward pass becomes expensive. Since all LoRAS are still loaded.
Option three:
Find an elegant way of loading LoRAs to CPU and then moving them to GPU as needed. While I was trying to do that, I saw the new parameter of hotswapping in hte load_lora_weights method. And this is what is described in the documentation:
hotswap — (bool, optional) Defaults to False. Whether to substitute an existing (LoRA) adapter with the newly loaded adapter in-place. This means that, instead of loading an additional adapter, this will take the existing adapter weights and replace them with the weights of the new adapter. This can be faster and more memory efficient. However, the main advantage of hotswapping is that when the model is compiled with torch.compile, loading the new adapter does not require recompilation of the model. When using hotswapping, the passed adapter_name should be the name of an already loaded adapter. If the new adapter and the old adapter have different ranks and/or LoRA alphas (i.e. scaling), you need to call an additional method before loading the adapter
could someone help me out here and name the mysterious function to be called?
and optionally would be great if someone could help me with my scenario.
The text was updated successfully, but these errors were encountered:
Hello everyone.
Here is the scenario I have.
I have say 10 LoRAs that I would like to load and use depending on the request.
Option one:
using
load_lora_weights
- reads from the disk and moves to device: expensive operationOption two:
load all loras and weights of non-used LoRAS with
set_adapters
method to 0.0. Not practical since the forward pass becomes expensive. Since all LoRAS are still loaded.Option three:
Find an elegant way of loading LoRAs to CPU and then moving them to GPU as needed. While I was trying to do that, I saw the new parameter of hotswapping in hte load_lora_weights method. And this is what is described in the documentation:
hotswap — (bool, optional) Defaults to False. Whether to substitute an existing (LoRA) adapter with the newly loaded adapter in-place. This means that, instead of loading an additional adapter, this will take the existing adapter weights and replace them with the weights of the new adapter. This can be faster and more memory efficient. However, the main advantage of hotswapping is that when the model is compiled with torch.compile, loading the new adapter does not require recompilation of the model. When using hotswapping, the passed adapter_name should be the name of an already loaded adapter. If the new adapter and the old adapter have different ranks and/or LoRA alphas (i.e. scaling), you need to call an additional method before loading the adapter
could someone help me out here and name the mysterious function to be called?
and optionally would be great if someone could help me with my scenario.
The text was updated successfully, but these errors were encountered: