-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LR scheduler to PPO recipe #2064
Comments
Hi @SalmanMohammadi, I attempted to solve this issue and made some modifications to To ensure this, do I only need to pass the three tests (Unit test, Recipe test, Regression test)? If there are any additional steps I should take, I would appreciate your guidance on contributing to this issue. Thanks in advance! |
Thank you for taking this on! You can push a draft PR and share it here so we can give better feedback. Ideally your changes should look similar to the LR scheduler solution in SFT single device. For testing you should match any tests that exist for it for the SFT version and include a sample run with it and show a similar or better loss curve than with it turned off. |
@pbontrager Thank you for your advice!
Among the registered The original code, without adding an lr scheduler, also results in the same error. How should I resolve this issue? I am running this on a single A100 GPU on RunPod. System Specifications:
I would appreciate any advice. Thanks! |
Additionally, before encountering the OOM issue mentioned above, I faced another error. I would like to confirm whether my solution to this issue is appropriate. When I ran the following command:
This issue occurred because To resolve this, I set After making this change, the command I would like to confirm whether this approach is valid. Could you please provide guidance on whether this is an appropriate fix? Thank you in advance for your advice! |
Hi @Seoley - thank you so much for taking this on. Sorry for the issues here. I've put up a PR to fix the issue you're coming across above. For now, rather than using Support for different models for PPO is limited until we land general classifier support. To make testing easier, try the config I shared in this comment which uses a small Llama2 model for testing with a corresponding reward model I'd trained. This should require substantially less memory. Let me know if you run into any issues with it as it may need some slight updates. |
I've put up PR following your advice! But, I have some new issue. I would appreciate it if you could review this. #2423 |
No description provided.
The text was updated successfully, but these errors were encountered: