Skip to content

model_cpu_offload failed in unidiffusers pipeline #11443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yao-matrix opened this issue Apr 29, 2025 · 1 comment
Open

model_cpu_offload failed in unidiffusers pipeline #11443

yao-matrix opened this issue Apr 29, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@yao-matrix
Copy link
Contributor

yao-matrix commented Apr 29, 2025

Describe the bug

unidiffusers cpu_offload failed with the log in Reproduction column.

I took a deeper look, it seems that in this case, self.text_decoder.encode will be called after text_encoder and before image_encoder. The thing is this is just a submodule of text_decoder and not in model_cpu_offload_seq, so didn't register hook while enable_model_cpu_offload. It became an orphan. I don't have a good idea to fix it, since it's an embeded submodule in a sub-model and whether to trigger it is a runtime decision based on reduce_text_emb_dim. But I'm willing to contribute to the fix of it.

@yiyixuxu @sayakpaul @DN6

Reproduction

pytest -rA tests/pipelines/unidiffuser/test_unidiffuser.py::UniDiffuserPipelineFastTests::test_model_cpu_offload_forward_pass you can see below error log. The same issue happens on CUDA too.

self = Linear(in_features=32, out_features=32, bias=True)
input = tensor([[[-0.8407, -0.3964, -0.6832, ..., -0.2908, 0.1523, -1.0043],
[-0.8155, -0.1579, 0.6659, ..., 1.4...375, -0.4626, -0.3352],
[-1.2005, -0.1820, 0.4218, ..., -0.3822, -0.5105, -0.2234]]],
device='xpu:0')

def forward(self, input: Tensor) -> Tensor:
    # print(f"input.device: {input.device}, weight device: {self.weight.device}")
  return F.linear(input, self.weight, self.bias)

E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and xpu:0! (when checking argument for argument mat1 in method wrapper_XPU_addmm)

Logs

System Info

N/A

Who can help?

No response

@yao-matrix yao-matrix added the bug Something isn't working label Apr 29, 2025
@sayakpaul
Copy link
Member

Thanks for the issue. I think we'd be open to your PR for fixing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants