You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide outlines the parallelization strategy for FSDP2 training in NeMo-RL.
3
+
This guide outlines the parallelization strategy for Fully Sharded Data Parallel version 2 (FSDP2) training in NeMoRL.
4
4
5
5
## Fallback Priority
6
6
7
-
Three parallelization approaches are supported, with the following fallback priority.
7
+
NeMo RL supports three parallelization strategies, applied in the following order of fallback priority:
8
8
9
-
**Custom Parallel Plan**
9
+
### 1. Custom Parallel Plan
10
10
11
-
User-defined custom parallel plans take precedence when available.
11
+
Your user-defined custom parallel plans always take precedence when available. For detailed implementation and usage, refer to the [Custom Parallel Plan Example](#custom-parallel-plan-example).
12
12
13
-
For implementation details and usage guidelines, please refer to [Custom Parallel Plan Example](#custom-parallel-plan-example).
13
+
### 2. Optimized Parallel Plan
14
14
15
-
**Optimized Parallel Plan**
15
+
Optimized parallel plans are available for specific model architectures. They may offer superior performance compared to Hugging Face's tensor parallel implementation. This approach is used if no custom parallel plan is specified and the model class supports optimized parallelization.
16
16
17
-
Optimized parallel plans are available for specific model architectures and may offer superior performance compared to the Hugging Face tensor parallel implementation.
17
+
### 3. Hugging Face Tensor Parallel Plan
18
18
19
-
This approach is used when no custom parallel plan is specified and the model class supports optimized parallelization.
20
-
21
-
**Hugging Face Tensor Parallel Plan**
22
-
23
-
Hugging Face provides tensor parallelism for most models through `._tp_plan`.
24
-
25
-
It serves as the default when neither custom nor optimized parallel plans are available.
19
+
The Hugging Face tensor parallel plan is the default. It's available for most models via `._tp_plan` and is used when neither a custom nor an optimized parallel plan is available.
26
20
27
21
## Custom Parallel Plan Example
28
22
29
-
Custom parallel plan should be defined in a file, exemplified by `examples/custom_parallel.py`.
30
-
31
-
To implement the custom parallel plan, configure `policy.dtensor_cfg.custom_parallel_plan=examples.custom_parallel.custom_parallel_plan`.
32
-
33
-
```python
34
-
from torch.distributed.tensor.parallel import ColwiseParallel, RowwiseParallel
35
-
from torch.distributed.tensor.placement_types import Replicate, Shard
23
+
A custom parallel plan should be defined in a separate file, such as the example provided in `examples/custom_parallel.py`.
36
24
25
+
To implement the custom parallel plan, either update the value of `custom_parallel_plan` in the `yaml` file directly, or pass the override via the command line. For example:
0 commit comments