-
Notifications
You must be signed in to change notification settings - Fork 61
Add gru component #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/trainer/unlearn/gru.py
Outdated
|
||
g1g1 = self.compute_total_gradient_dot_product(g1, self.structure_map, g1, self.structure_map) | ||
gg1 = self.dotp_retain | ||
print(gg1/g1g1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clean up the code - e.g. remove this print and any other unnecessary prints/comments from the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I've removed unnecessary print statements. Let me know if anything else needs updating!
Hi! Sorry for jumping into the conversation, and really interesting method by the way! I tried to reproduce the TOFU forget10 results of GRU on the Llama-3.2-1B-Instruct, following the example command you had above, but the result doesn't seem to match. I ran twice but both time the I also ran the same configurations with RMU, and its So I'm just wondering what your exact evaluation settings were and if you could double-check the settings, especially regarding model utility, which seems to be the main argument for GRU. |
Hi @rtx-1999, Thank you for your interest in our method and for taking the time to test it out! We appreciate your efforts. Let me address your points: We’ve uploaded an eval output JSON log file Regarding the RMU results you mentioned, we’d like to kindly clarify that the numbers in the table were directly taken from the original table in results.md. We didn’t conduct any additional evaluations for RMU ourselves. The only row we added to the table was for GradAscent w/ GRU, which reflects the results from our work. Finally, we want to note that the results in our table were produced using the common hyperparameters provided by the open-unlearning authors in results.md. These were used for reproducibility purposes, and the table isn’t intended for direct comparison between methods, as the authors themselves have stated. Once again, we truly appreciate your interest in our work and your efforts to test it. Let us know if there's anything more we can assist with! |
Thanks for your response and clarification on RMU results! Just wanted to clarify that making direct comparison between methods wasn't my purpose. I'm just trying to make sure that I have the same evaluation setup as yours. Here are the hydra configs, my evaluation script, eval.log, TOFU_EVAL.json, GRU.log and trainer_state.json, zipped in settings.zip as the yaml format is not supported for attachment here. As I checked, I used the exact setting for unlearning:
And for the evaluation script, I followed the default configs in tofu_unlearn.sh, equivalently:
|
@rtx-1999 I see that you are using only 1 device for unlearning. Using different number of GPU can effect the results quite a lot as the effective hyper parameters such as batch size etc are changed. Additionally, even if we manage to keep the effective hyper params same, using deepspeed can lead to different results. @yuuee-www can you confirm how many GPUs have you used for unlearning? |
Hi @yuuee-www, Thank you for taking the time to add your method to our repository. We've added guidelines on contributing to our repository, including how to share your settings, results, and reproducible scripts. Could you review them and update this PR accordingly? This should help others use and reproduce your work in their studies. Quick summary of the contribution guide:
We've provided a template folder to get you started. Let us know if you have any questions! |
Hi @Dornavineeth, Understood. We’ll follow the guidelines and update the PR soon. Cc @rtx-1999. Thanks! |
A couple of requests:
For your leaderboard results, you may report your best-performing If you have a better way of doing this, or face difficulties with this plan, do let us know. |
…pe' configuration with NPO w/ GRU method integration.
I see that ruff quality tests are failing here. Make sure to follow the instructions in https://github.com/locuslab/open-unlearning/blob/main/docs/contributing.md#create-a-pull-request to ensure you are in dev env and can apply ruff formatting. |
src/trainer/unlearn/gru.py
Outdated
self.gradient_accumulation_steps = kwargs["args"].gradient_accumulation_steps | ||
if self.ref_model is None and self.forget_loss_type == "NPO": | ||
self.ref_model = self._prepare_ref_model(self.model) | ||
#self.ref_model = self.model.to(self.args.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean code up
src/trainer/unlearn/gru.py
Outdated
flattened_grads1 = flattened_grads1.to('cuda') | ||
flattened_grads2 = flattened_grads2.to('cuda') | ||
|
||
# for ((name1, shape1), (name2, shape2)) in zip(structure_map1, structure_map2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean code up
src/trainer/unlearn/gru.py
Outdated
|
||
def pipeline(self): | ||
if self.dotp_retain < 0: | ||
#print("dotp_retain:",self.dotp_retain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up
community/methods/GRU/README.md
Outdated
|
||
# Results | ||
|
||
To replicate your results, provide a `run.sh` script that contains all necessary commands to reproduce the final results. Ensure the script is well-documented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can
- remove all the template instructions from this doc which we've written ("provide a concise summary", etc.)
- add your results here
We can get this in once:
|
community/methods/GRU/README.md
Outdated
|
||
- [ ] **Hyperparameters & Search Space:** Specify key hyperparameters, their search ranges, number of trials etc. | ||
- [ ] **Computational Setup:** Mention the type and number of GPUs used. | ||
- [ ] **DeepSpeed Configuration:** If any modifications were made to the default DeepSpeed config, specify them here. (You may include the config as a code block.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we use DeepSpeed for distributed training in our code and experiments, and provide documentation with that in mind. Please test if your code works with DeepSpeed and if not (looking at the code it seems it may not), add a note to your documentation describing in what your method works.
If DeepSpeed is not supported, mention that
What does this PR do?
This PR introduces the integration of the Gradient Rectified Unlearning (GRU) with the Gradient Ascent (GA) unlearning objective. Currently, we are providing the GRU with the GA version due to time constraints. Integrations with other unlearning objectives, such as NPO, SimNPO, WGA, and GD, are planned and will be added in subsequent updates as soon as possible. For more detailed information, please refer to the original paper on arXiv.
Fixes # (issue)
Before submitting
documentation guidelines.
Sample Implementation and Some Reproduction Results
We run some examples following the common hyperparameters in docs/evaluation.md for reproducibility purposes. As DeepSpeed does not support gradient manipulation currently, it is not used in our implementation. An example command for executing the GRU on forget10 is as follows::
The reproducible results on the Llama-3.2-1B-Instruct architecture:
TOFU unlearning on the
Llama-3.2-1B-Instruct
architecture