Skip to content

utility to dump details of all nodes in a cluster, into a csv file #652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

amitosaurus
Copy link

@amitosaurus amitosaurus commented Apr 25, 2025

Issue #, if available:

Description of changes:
Creating a 'tools' directory for utility scripts, and adding a 'list_cluster_nodes.py' 'dump_cluster_nodes_info.py' utility to dump details of all nodes in a cluster, into a csv file

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@amitosaurus amitosaurus marked this pull request as draft April 25, 2025 19:53
@amitosaurus amitosaurus marked this pull request as ready for review April 25, 2025 19:59
Copy link
Collaborator

@KeitaW KeitaW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting but I don't see why we want to have such util script.

 aws sagemaker list-cluster-nodes --region us-west-2 --cluster-name ml-cluster-trn1

should be enough. Could you elaborate the motivation?

@amitosaurus
Copy link
Author

The list-cluster-nodes command does not provide the primary IP of the node, which we have found to be critical while troubleshooting critical issues.

@amitosaurus amitosaurus requested a review from KeitaW April 28, 2025 16:51
@shimomut
Copy link
Collaborator

@KeitaW
So this script is list_cluster_nodes() to list all nodes with pagenation handling + describe_cluster_node() for each node.

@shimomut
Copy link
Collaborator

@amitosaurus , to make the intention of this script clearer for users, does it make sense to rename the script to something like "dump_cluster_nodes.py" or "list_cluster_nodes_in_detail.py"?

updated script name to better reflect it's functionality
@amitosaurus
Copy link
Author

Updated the script name to "dump_cluster_nodes_info.py" to better reflect it's functionality

@KeitaW
Copy link
Collaborator

KeitaW commented May 1, 2025

Noted. Kindly add README inside the tool directory. Thank you!

Adding README.md that provides guidelines for usage of utility script(s) in the "tools" folder
@amitosaurus
Copy link
Author

README.md file added under the tools folder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants