Skip to content

Commit 0dbc76a

Browse files
committed
NeMo 2 Performance instructions
1 parent 0defd7f commit 0dbc76a

File tree

1 file changed

+86
-0
lines changed

1 file changed

+86
-0
lines changed
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Performance
2+
3+
This document describes the process of performance measurements of NeMo 2.x framework.
4+
5+
Useful links and the original documentation:
6+
* [NVIDIA NeMo Performance Summary](https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/performance-summary.html)
7+
* [NVIDIA NeMo Performance Scripts](https://github.com/NVIDIA/NeMo/tree/main/scripts/performance/llm)
8+
* [NVIDIA NeMo Compatibility Matrix](https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html)
9+
10+
### Create Conda Environment
11+
12+
```bash
13+
conda create -yn nemo python=3.12
14+
conda activate nemo
15+
```
16+
17+
### Install NeMo
18+
19+
Make sure that Nemo version is compatible with the docker image according to the [compatibility matrix](https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html)
20+
21+
```bash
22+
git clone git@github.com:NVIDIA/NeMo.git
23+
cd NeMo
24+
git checkout v2.5.0rc0 # 2.4.0 was broken https://github.com/NVIDIA/NeMo/issues/14392
25+
pip install -e '.[all]' # -e makes recommended model configs loadable
26+
```
27+
28+
Optionally specify where to store the performance results:
29+
30+
```bash
31+
export NEMORUN_HOME=/fsxl/.../nemo_run
32+
```
33+
34+
### Build Docker Image
35+
36+
The docker file is supposed to start with `FROM nvcr.io/nvidia/nemo:YY.MM` and continue with EFA installation. Make sure that the docker image is compatible with the Nemo version according to the [compatibility matrix](https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html)
37+
38+
```bash
39+
docker build --progress=plain -t aws-nemo:latest -f Dockerfile .
40+
enroot import -o ~/aws-nemo.sqsh dockerd://aws-nemo:latest
41+
```
42+
43+
### Run Performance Test
44+
45+
To enable EFA just export environment variables:
46+
47+
```bash
48+
export FI_PROVIDER=efa
49+
export NCCL_DEBUG=INFO
50+
```
51+
52+
## Recommended model configs
53+
54+
## NVIDIA H100(also applicable to H200)
55+
56+
`NeMo/scripts/performance/recommended_model_configs/model_configs_h100.csv`
57+
58+
| Model | #-GPUs | GBS | MBS | Sequence Length | TP | PP | CP | VP | EP | GA |
59+
|-----------|--------|-----|-----|-----------------|----|----|----|----|----|----|
60+
| LLAMA3-8B | 8 | 128 | 1 | 8192 | 1 | 1 | 2 | 1 | 1 | 32 |
61+
62+
```bash
63+
python -m scripts.performance.llm.pretrain_llama3_8b \
64+
--account $(whoami) --partition p5en -i ./aws-nemo.sqsh \
65+
--gpu h100 --num_gpus 8 -gb 128 -mb 1 -tp 1 -pp 1 -cp 2 -vp 1 -ep 1
66+
```
67+
68+
| Model | #-GPUs | GBS | MBS | Sequence Length | TP | PP | CP | VP | EP | GA |
69+
|------------|--------|-----|-----|-----------------|----|----|----|----|----|----|
70+
| LLAMA3-70B | 64 | 128 | 1 | 8192 | 4 | 8 | 1 | 5 | 1 | 64 |
71+
72+
```bash
73+
python -m scripts.performance.llm.pretrain_llama3_70b \
74+
--account $(whoami) --partition p5en -i ./aws-nemo.sqsh \
75+
--gpu h100 --num_gpus 64 -gb 128 -mb 1 -tp 4 -pp 8 -cp 1 -vp 5 -ep 1
76+
```
77+
78+
| Model | #-GPUs | GBS | MBS | Sequence Length | TP | PP | CP | VP | EP | GA |
79+
|---------------|--------|-----|-----|-----------------|----|----|----|----|----|----|
80+
| LLAMA3.1-405B | 128 | 64 | 1 | 8192 | 8 | 8 | 2 | 8 | 1 | 64 |
81+
82+
```bash
83+
python -m scripts.performance.llm.pretrain_llama31_405b \
84+
--account $(whoami) --partition p5en -i ./aws-nemo.sqsh \
85+
--gpu h100 --num_gpus 128 -gb 64 -mb 1 -tp 8 -pp 8 -cp 2 -vp 8 -ep 1
86+
```

0 commit comments

Comments
 (0)