Skip to content

[Bugfix] Fix the bug of torch_npu that raising segment fault when enable pin_memory while creating a tensor #597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shen-shanshan
Copy link
Collaborator

@shen-shanshan shen-shanshan commented Apr 21, 2025

What this PR does / why we need it?

Fix the bug in torch 2.5.1 that raising segment fault when enable pin_memory while creating a tensor using torch.tensor.

Does this PR introduce any user-facing change?

How was this patch tested?

…le creating a tensor using

Signed-off-by: shen-shanshan <467638484@qq.com>
@wangxiyuan
Copy link
Collaborator

please update the commit message to make it readable.

@@ -49,6 +49,9 @@
FlexibleArgumentParser = None

os.environ["RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES"] = "1"
# Fix the bug in torch 2.5.1 that raising segment fault when enable `pin_memory`
# while creating a tensor using `torch.tensor`.
os.environ["ACL_OP_INIT_MODE"] = "1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when ACL_OP_INIT_MODE can be set to 0? If it's not hardcode to 1, please add it to env.py. Otherwise it's fine adding here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxiyuan I have asked @mengwei805 , this value can be fixed at 1 directly.

@shen-shanshan
Copy link
Collaborator Author

please update the commit message to make it readable.

ok.

@wangxiyuan
Copy link
Collaborator

Thanks

@wangxiyuan
Copy link
Collaborator

@ganyi1996ppo please double check this PR as well. If it's fine. Feel free to merge it.

@ganyi1996ppo
Copy link
Collaborator

Have no clue about this, why we need ACL_OP_INIT_MODE to ensure pin_memory's functionality, why its not open by default? can we find a runtime expert to explain this option? @wangxiyuan @shen-shanshan @wuhuikx

@shen-shanshan
Copy link
Collaborator Author

Have no clue about this, why we need ACL_OP_INIT_MODE to ensure pin_memory's functionality, why its not open by default? can we find a runtime expert to explain this option? @wangxiyuan @shen-shanshan @wuhuikx

@ganyi1996ppo ACL_OP_INIT_MODE is default as 0. When torch_npu initialize, it will read this env var and use this to init ACL OP. Can you supplement more details? @mengwei805

@wangxiyuan
Copy link
Collaborator

@shen-shanshan The question form @ganyi1996ppo is that you should add more comment about the bug. For example is there any related issue from torch-npu? how torch-npu deal with this bug? what and how ACL_OP_INIT_MODE works in torch-npu. We can't just add this ENV for a quite simple reason that torch-npu has a bug

@shen-shanshan
Copy link
Collaborator Author

@shen-shanshan The question form @ganyi1996ppo is that you should add more comment about the bug. For example is there any related issue from torch-npu? how torch-npu deal with this bug? what and how ACL_OP_INIT_MODE works in torch-npu. We can't just add this ENV for a quite simple reason that torch-npu has a bug

That is complex, give me some time dive into the source code of torch_npu to find the answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants