Skip to content

Conversation

pbelevich
Copy link
Contributor

No description provided.

@pbelevich pbelevich requested a review from amanshanbhag August 5, 2025 16:01
@pbelevich pbelevich marked this pull request as ready for review August 12, 2025 16:24
@pbelevich pbelevich requested review from KeitaW and nghtm August 12, 2025 16:24
export NEMORUN_HOME=/fsxl/.../nemo_run
```

### Build Docker Image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me here reading this if we are telling users to extend docker image from nvcr.io with EFA variables?

is that what we mean by "continue with EFA installation"?

If so - can we link out to the official instructions to install EFA into a docker container

@amanshanbhag
Copy link
Contributor

Please wait on providing feedback/merging. Let's merge all together once we have perf assets ready/converged.

--account $(whoami) --partition p6 -i ./aws-nemo.sqsh \
--gpu b200 -c fp8 --num_gpus 64 -gb 128 -mb 1 -tp 2 -pp 4 -cp 2 -vp 5 -ep 1
```
error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should elaborate what error we have encountered?

@@ -0,0 +1,281 @@
# Performance

This document describes the process of performance measurements of NeMo 2.x framework.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What is the desired output?
  • How can we run the performance measurements for multi-nodes distributed training case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants