-
Notifications
You must be signed in to change notification settings - Fork 141
NeMo 2 Performance instructions #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fb010bf
to
0dbc76a
Compare
export NEMORUN_HOME=/fsxl/.../nemo_run | ||
``` | ||
|
||
### Build Docker Image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me here reading this if we are telling users to extend docker image from nvcr.io with EFA variables?
is that what we mean by "continue with EFA installation"?
If so - can we link out to the official instructions to install EFA into a docker container
Please wait on providing feedback/merging. Let's merge all together once we have perf assets ready/converged. |
--account $(whoami) --partition p6 -i ./aws-nemo.sqsh \ | ||
--gpu b200 -c fp8 --num_gpus 64 -gb 128 -mb 1 -tp 2 -pp 4 -cp 2 -vp 5 -ep 1 | ||
``` | ||
error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should elaborate what error we have encountered?
@@ -0,0 +1,281 @@ | |||
# Performance | |||
|
|||
This document describes the process of performance measurements of NeMo 2.x framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What is the desired output?
- How can we run the performance measurements for multi-nodes distributed training case?
No description provided.