Add LR scheduler to PPO recipe #2064

SalmanMohammadi · 2024-11-25T14:04:52Z

No description provided.

Seoley · 2025-02-18T14:29:22Z

I attempted to solve this issue and made some modifications to ppo_full_finetune_single_device.py I would like to verify whether my code is working correctly.

To ensure this, do I only need to pass the three tests (Unit test, Recipe test, Regression test)? If there are any additional steps I should take, I would appreciate your guidance on contributing to this issue.

Thanks in advance!

pbontrager · 2025-02-18T16:34:11Z

Thank you for taking this on! You can push a draft PR and share it here so we can give better feedback. Ideally your changes should look similar to the LR scheduler solution in SFT single device. For testing you should match any tests that exist for it for the SFT version and include a sample run with it and show a similar or better loss curve than with it turned off.

Seoley · 2025-02-19T02:50:36Z

@pbontrager Thank you for your advice!
I followed the procedures you suggested. I used A100 because the Mistral PPO recipe recommended it to validate, but I encountered the following issue.

  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 1277, in recipe_main
    recipe.train()
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 1007, in train
    trajectory = self.generate_trajectory_batched(batch)
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 967, in generate_trajectory_batched
    trajectories.append(self.generate_trajectory(batch_input_ids))
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 843, in generate_trajectory
    with self.cache_ctx_manager(self.enable_kv_cache):
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/root/torchtune/torchtune/modules/common_utils.py", line 389, in local_kv_cache
    model.setup_caches(
  File "/root/torchtune/torchtune/modules/transformer.py", line 441, in setup_caches
    layer.setup_caches(
  File "/root/torchtune/torchtune/modules/transformer.py", line 62, in setup_caches
    self.attn.setup_cache(batch_size, dtype, max_seq_len=decoder_max_seq_len)
  File "/root/torchtune/torchtune/modules/attention.py", line 165, in setup_cache
    self.kv_cache = KVCache(
  File "/root/torchtune/torchtune/modules/kv_cache.py", line 39, in __init__
    "v_cache", torch.zeros(cache_shape, dtype=dtype), persistent=False
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_device.py", line 106, in __torch_function__
    return func(*args, **kwargs)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.01 GiB. GPU 0 has a total capacity of 79.14 GiB of which 2.59 GiB is free. Process 558874 has 76.54 GiB memory in use. Of the allocated memory 75.51 GiB is allocated by PyTorch, and 569.34 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Among the registered recipes, the only one that uses PPO is Mistral7B, so I have no choice but to use Mistral7B for validation. Although the Mistral7B config file recommends using an A100, an OOM (Out of Memory) issue occurs.

The original code, without adding an lr scheduler, also results in the same error. How should I resolve this issue?

I am running this on a single A100 GPU on RunPod.

System Specifications:

GPU: 1x A100 SXM (80GB VRAM)
MEM: 125GB RAM
CPU: 16 vCPU
Container Disk: 80GB

I would appreciate any advice. Thanks!

Seoley · 2025-02-19T02:53:48Z

Additionally, before encountering the OOM issue mentioned above, I faced another error. I would like to confirm whether my solution to this issue is appropriate.

When I ran the following command: tune run ppo_full_finetune_single_device --config mistral/7B_full_ppo_low_memory
I encountered the following error:

  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 1007, in train
    trajectory = self.generate_trajectory_batched(batch)
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 967, in generate_trajectory_batched
    trajectories.append(self.generate_trajectory(batch_input_ids))
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 843, in generate_trajectory
    with self.cache_ctx_manager(self.enable_kv_cache):
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 240, in <lambda>
    decoder_max_seq_len=self._tokenizer.max_seq_len
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

This issue occurred because tokenizer.max_seq_len in torchtune/recipes/configs/mistral/7B_full_ppo_low_memory.yaml was set to null.

To resolve this, I set tokenizer.max_seq_len in mistral/7B_full_ppo_low_memory.yaml to 32768, which is the same value as reward_and_value_model.max_seq_len

After making this change, the command tune run ppo_full_finetune_single_device --config mistral/7B_full_ppo_low_memory executed without any issues, leading to the OOM issue mentioned above.

I would like to confirm whether this approach is valid. Could you please provide guidance on whether this is an appropriate fix?

Thank you in advance for your advice!

SalmanMohammadi · 2025-02-19T12:17:55Z

Hi @Seoley - thank you so much for taking this on. Sorry for the issues here.

I've put up a PR to fix the issue you're coming across above. For now, rather than using tokenizer.max_seq_len=32768, try a much smaller value (512 should be sufficient here).

Support for different models for PPO is limited until we land general classifier support. To make testing easier, try the config I shared in this comment which uses a small Llama2 model for testing with a corresponding reward model I'd trained. This should require substantially less memory. Let me know if you run into any issues with it as it may need some slight updates.

Seoley · 2025-02-22T18:05:46Z

I've put up PR following your advice! But, I have some new issue. I would appreciate it if you could review this. #2423

SalmanMohammadi added the community help wanted We would love the community's help completing this issue label Nov 25, 2024

SalmanMohammadi mentioned this issue Feb 19, 2025

Update KVCache maximum sequence length configuration in PPO recipe #2412

Merged

Seoley mentioned this issue Feb 22, 2025

[RFC] Proposal to Update PPO Test to Add LR Scheduler #2423

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LR scheduler to PPO recipe #2064

Add LR scheduler to PPO recipe #2064

SalmanMohammadi commented Nov 25, 2024

Seoley commented Feb 18, 2025

pbontrager commented Feb 18, 2025

Seoley commented Feb 19, 2025 •

edited

Loading

Seoley commented Feb 19, 2025 •

edited

Loading

SalmanMohammadi commented Feb 19, 2025

Seoley commented Feb 22, 2025

Add LR scheduler to PPO recipe #2064

Add LR scheduler to PPO recipe #2064

Comments

SalmanMohammadi commented Nov 25, 2024

Seoley commented Feb 18, 2025

pbontrager commented Feb 18, 2025

Seoley commented Feb 19, 2025 • edited Loading

Seoley commented Feb 19, 2025 • edited Loading

SalmanMohammadi commented Feb 19, 2025

Seoley commented Feb 22, 2025

Seoley commented Feb 19, 2025 •

edited

Loading

Seoley commented Feb 19, 2025 •

edited

Loading