-
Notifications
You must be signed in to change notification settings - Fork 118
utility to dump details of all nodes in a cluster, into a csv file #652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting but I don't see why we want to have such util script.
aws sagemaker list-cluster-nodes --region us-west-2 --cluster-name ml-cluster-trn1
should be enough. Could you elaborate the motivation?
The list-cluster-nodes command does not provide the primary IP of the node, which we have found to be critical while troubleshooting critical issues. |
@KeitaW |
@amitosaurus , to make the intention of this script clearer for users, does it make sense to rename the script to something like "dump_cluster_nodes.py" or "list_cluster_nodes_in_detail.py"? |
updated script name to better reflect it's functionality
Updated the script name to "dump_cluster_nodes_info.py" to better reflect it's functionality |
Noted. Kindly add README inside the |
Adding README.md that provides guidelines for usage of utility script(s) in the "tools" folder
README.md file added under the |
Issue #, if available:
Description of changes:
Creating a 'tools' directory for utility scripts, and adding a
'list_cluster_nodes.py''dump_cluster_nodes_info.py' utility to dump details of all nodes in a cluster, into a csv fileBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.