Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LR scheduler to PPO recipe #2064

Open
SalmanMohammadi opened this issue Nov 25, 2024 · 6 comments
Open

Add LR scheduler to PPO recipe #2064

SalmanMohammadi opened this issue Nov 25, 2024 · 6 comments
Labels
community help wanted We would love the community's help completing this issue

Comments

@SalmanMohammadi
Copy link
Collaborator

No description provided.

@SalmanMohammadi SalmanMohammadi added the community help wanted We would love the community's help completing this issue label Nov 25, 2024
@Seoley
Copy link

Seoley commented Feb 18, 2025

Hi @SalmanMohammadi,

I attempted to solve this issue and made some modifications to ppo_full_finetune_single_device.py I would like to verify whether my code is working correctly.

To ensure this, do I only need to pass the three tests (Unit test, Recipe test, Regression test)? If there are any additional steps I should take, I would appreciate your guidance on contributing to this issue.

Thanks in advance!

@pbontrager
Copy link
Contributor

Thank you for taking this on! You can push a draft PR and share it here so we can give better feedback. Ideally your changes should look similar to the LR scheduler solution in SFT single device. For testing you should match any tests that exist for it for the SFT version and include a sample run with it and show a similar or better loss curve than with it turned off.

@Seoley
Copy link

Seoley commented Feb 19, 2025

@pbontrager Thank you for your advice!
I followed the procedures you suggested. I used A100 because the Mistral PPO recipe recommended it to validate, but I encountered the following issue.

  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 1277, in recipe_main
    recipe.train()
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 1007, in train
    trajectory = self.generate_trajectory_batched(batch)
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 967, in generate_trajectory_batched
    trajectories.append(self.generate_trajectory(batch_input_ids))
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 843, in generate_trajectory
    with self.cache_ctx_manager(self.enable_kv_cache):
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/root/torchtune/torchtune/modules/common_utils.py", line 389, in local_kv_cache
    model.setup_caches(
  File "/root/torchtune/torchtune/modules/transformer.py", line 441, in setup_caches
    layer.setup_caches(
  File "/root/torchtune/torchtune/modules/transformer.py", line 62, in setup_caches
    self.attn.setup_cache(batch_size, dtype, max_seq_len=decoder_max_seq_len)
  File "/root/torchtune/torchtune/modules/attention.py", line 165, in setup_cache
    self.kv_cache = KVCache(
  File "/root/torchtune/torchtune/modules/kv_cache.py", line 39, in __init__
    "v_cache", torch.zeros(cache_shape, dtype=dtype), persistent=False
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_device.py", line 106, in __torch_function__
    return func(*args, **kwargs)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.01 GiB. GPU 0 has a total capacity of 79.14 GiB of which 2.59 GiB is free. Process 558874 has 76.54 GiB memory in use. Of the allocated memory 75.51 GiB is allocated by PyTorch, and 569.34 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Among the registered recipes, the only one that uses PPO is Mistral7B, so I have no choice but to use Mistral7B for validation. Although the Mistral7B config file recommends using an A100, an OOM (Out of Memory) issue occurs.

The original code, without adding an lr scheduler, also results in the same error. How should I resolve this issue?

I am running this on a single A100 GPU on RunPod.

System Specifications:

  • GPU: 1x A100 SXM (80GB VRAM)
  • MEM: 125GB RAM
  • CPU: 16 vCPU
  • Container Disk: 80GB

I would appreciate any advice. Thanks!

@Seoley
Copy link

Seoley commented Feb 19, 2025

Additionally, before encountering the OOM issue mentioned above, I faced another error. I would like to confirm whether my solution to this issue is appropriate.

When I ran the following command: tune run ppo_full_finetune_single_device --config mistral/7B_full_ppo_low_memory
I encountered the following error:

  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 1007, in train
    trajectory = self.generate_trajectory_batched(batch)
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 967, in generate_trajectory_batched
    trajectories.append(self.generate_trajectory(batch_input_ids))
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 843, in generate_trajectory
    with self.cache_ctx_manager(self.enable_kv_cache):
  File "/root/torchtune/recipes/ppo_full_finetune_single_device.py", line 240, in <lambda>
    decoder_max_seq_len=self._tokenizer.max_seq_len
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

This issue occurred because tokenizer.max_seq_len in torchtune/recipes/configs/mistral/7B_full_ppo_low_memory.yaml was set to null.

To resolve this, I set tokenizer.max_seq_len in mistral/7B_full_ppo_low_memory.yaml to 32768, which is the same value as reward_and_value_model.max_seq_len

After making this change, the command tune run ppo_full_finetune_single_device --config mistral/7B_full_ppo_low_memory executed without any issues, leading to the OOM issue mentioned above.

I would like to confirm whether this approach is valid. Could you please provide guidance on whether this is an appropriate fix?

Thank you in advance for your advice!

@SalmanMohammadi
Copy link
Collaborator Author

Hi @Seoley - thank you so much for taking this on. Sorry for the issues here.

I've put up a PR to fix the issue you're coming across above. For now, rather than using tokenizer.max_seq_len=32768, try a much smaller value (512 should be sufficient here).

Support for different models for PPO is limited until we land general classifier support. To make testing easier, try the config I shared in this comment which uses a small Llama2 model for testing with a corresponding reward model I'd trained. This should require substantially less memory. Let me know if you run into any issues with it as it may need some slight updates.

@Seoley
Copy link

Seoley commented Feb 22, 2025

I've put up PR following your advice! But, I have some new issue. I would appreciate it if you could review this. #2423

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community help wanted We would love the community's help completing this issue
Projects
None yet
Development

No branches or pull requests

3 participants