|
| 1 | +# Performance |
| 2 | + |
| 3 | +This document describes the process of performance measurements of NeMo 2.x framework. |
| 4 | + |
| 5 | +Useful links and the original documentation: |
| 6 | +* [NVIDIA NeMo Performance Summary](https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/performance-summary.html) |
| 7 | +* [NVIDIA NeMo Performance Scripts](https://github.com/NVIDIA/NeMo/tree/main/scripts/performance/llm) |
| 8 | +* [NVIDIA NeMo Compatibility Matrix](https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html) |
| 9 | + |
| 10 | +### Create Conda Environment |
| 11 | + |
| 12 | +```bash |
| 13 | +conda create -yn nemo python=3.12 |
| 14 | +conda activate nemo |
| 15 | +``` |
| 16 | + |
| 17 | +### Install NeMo |
| 18 | + |
| 19 | +Make sure that Nemo version is compatible with the docker image according to the [compatibility matrix](https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html) |
| 20 | + |
| 21 | +```bash |
| 22 | +git clone git@github.com:NVIDIA/NeMo.git |
| 23 | +cd NeMo |
| 24 | +git checkout v2.5.0rc0 # 2.4.0 was broken https://github.com/NVIDIA/NeMo/issues/14392 |
| 25 | +pip install -e '.[all]' # -e makes recommended model configs loadable |
| 26 | +``` |
| 27 | + |
| 28 | +Optionally specify where to store the performance results: |
| 29 | + |
| 30 | +```bash |
| 31 | +export NEMORUN_HOME=/fsxl/.../nemo_run |
| 32 | +``` |
| 33 | + |
| 34 | +### Build Docker Image |
| 35 | + |
| 36 | +The docker file is supposed to start with `FROM nvcr.io/nvidia/nemo:YY.MM` and continue with EFA installation. Make sure that the docker image is compatible with the Nemo version according to the [compatibility matrix](https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html) |
| 37 | + |
| 38 | +```bash |
| 39 | +docker build --progress=plain -t aws-nemo:latest -f Dockerfile . |
| 40 | +enroot import -o ~/aws-nemo.sqsh dockerd://aws-nemo:latest |
| 41 | +``` |
| 42 | + |
| 43 | +### Run Performance Test |
| 44 | + |
| 45 | +To enable EFA just export environment variables: |
| 46 | + |
| 47 | +```bash |
| 48 | +export FI_PROVIDER=efa |
| 49 | +export NCCL_DEBUG=INFO |
| 50 | +``` |
| 51 | + |
| 52 | +## Recommended model configs |
| 53 | + |
| 54 | +## NVIDIA H100(also applicable to H200) |
| 55 | + |
| 56 | +`NeMo/scripts/performance/recommended_model_configs/model_configs_h100.csv` |
| 57 | + |
| 58 | +| Model | #-GPUs | GBS | MBS | Sequence Length | TP | PP | CP | VP | EP | GA | |
| 59 | +|-----------|--------|-----|-----|-----------------|----|----|----|----|----|----| |
| 60 | +| LLAMA3-8B | 8 | 128 | 1 | 8192 | 1 | 1 | 2 | 1 | 1 | 32 | |
| 61 | + |
| 62 | +```bash |
| 63 | +python -m scripts.performance.llm.pretrain_llama3_8b \ |
| 64 | + --account $(whoami) --partition p5en -i ./aws-nemo.sqsh \ |
| 65 | + --gpu h100 --num_gpus 8 -gb 128 -mb 1 -tp 1 -pp 1 -cp 2 -vp 1 -ep 1 |
| 66 | +``` |
| 67 | + |
| 68 | +| Model | #-GPUs | GBS | MBS | Sequence Length | TP | PP | CP | VP | EP | GA | |
| 69 | +|------------|--------|-----|-----|-----------------|----|----|----|----|----|----| |
| 70 | +| LLAMA3-70B | 64 | 128 | 1 | 8192 | 4 | 8 | 1 | 5 | 1 | 64 | |
| 71 | + |
| 72 | +```bash |
| 73 | +python -m scripts.performance.llm.pretrain_llama3_70b \ |
| 74 | + --account $(whoami) --partition p5en -i ./aws-nemo.sqsh \ |
| 75 | + --gpu h100 --num_gpus 64 -gb 128 -mb 1 -tp 4 -pp 8 -cp 1 -vp 5 -ep 1 |
| 76 | +``` |
| 77 | + |
| 78 | +| Model | #-GPUs | GBS | MBS | Sequence Length | TP | PP | CP | VP | EP | GA | |
| 79 | +|---------------|--------|-----|-----|-----------------|----|----|----|----|----|----| |
| 80 | +| LLAMA3.1-405B | 128 | 64 | 1 | 8192 | 8 | 8 | 2 | 8 | 1 | 64 | |
| 81 | + |
| 82 | +```bash |
| 83 | +python -m scripts.performance.llm.pretrain_llama31_405b \ |
| 84 | + --account $(whoami) --partition p5en -i ./aws-nemo.sqsh \ |
| 85 | + --gpu h100 --num_gpus 128 -gb 64 -mb 1 -tp 8 -pp 8 -cp 2 -vp 8 -ep 1 |
| 86 | +``` |
0 commit comments