feat: add CE-U loss #72

Atry · 2025-03-17T01:07:15Z

What does this PR do?

add CE-U loss - https://arxiv.org/abs/2503.01224

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Have you gone through the documentation of adding new components?
Did you make sure to update the documentation with your changes? Here are the pointers to documentation
documentation guidelines.

molereddy · 2025-03-17T21:43:54Z

Hi, thank you very much! We'll try to get this in soon.
Can you make some changes? first- src/trainer/utils.py is intended to contain functionalities used in multiple trainer files. Since your CE-U functions are specific to your method, I think are more appropriately to be placed in ceu.py. Unless you have another reason here, can you make this change?

Atry · 2025-03-19T00:05:12Z

@molereddy Sure. Fixed!

molereddy · 2025-03-20T06:02:20Z

Thank you. Can you run your method and populate the relevant tables in docs/results.md (for reproducibility, not comparison)?

molereddy · 2025-03-20T06:04:17Z

Also, please link your paper in one of the files. In the future, we plan to update documentation with all links in one place, for now, make your link easily available.

Atry · 2025-03-20T07:57:14Z

Sure. I found the result is different from unleaning using the old https://github.com/locuslab/tofu code base, even when my hyperparameter is the same. Some guess: 1. https://huggingface.co/locuslab/tofu_ft_llama2-7b and https://huggingface.co/open-unlearning/tofu_Llama-2-7b-chat-hf_full are not the same model. 2. Different handling of the last mini-batch of each epoch between this repository and the old https://github.com/locuslab/tofu

…

On Wed, Mar 19, 2025 at 11:02 PM Anmol Mekala ***@***.***> wrote: Thank you. Can you run your method and populate the relevant tables in docs/results.md (for reproducibility, not comparison)? — Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAES3ORLVCPO4NZSWEU7H4T2VJKYDAVCNFSM6AAAAABZEIOCDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZZGI4TIMRTHA> . You are receiving this because you authored the thread.Message ID: ***@***.***> [image: molereddy]*molereddy* left a comment (locuslab/open-unlearning#72) <#72 (comment)> Thank you. Can you run your method and populate the relevant tables in docs/results.md (for reproducibility, not comparison)? — Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAES3ORLVCPO4NZSWEU7H4T2VJKYDAVCNFSM6AAAAABZEIOCDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZZGI4TIMRTHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

molereddy · 2025-03-20T13:43:23Z

Yes, these models are different and with some other standardizations we've done to the code, results are expected to be different.
For now, please submit results for experiments performed with the same common hyperparams as the examples in results.md (these might not be good results, the aim is reproducibility).

We are planning to create a leaderboard with best hyperparams in the future but are currently unsure how to be fair to hp tuning across methods.

Atry · 2025-03-20T15:46:31Z

I have two questions:

Could you document the changes between https://github.com/locuslab/tofu and this repository?
Can I specify a different default learning rate for an unlearning method other than 1e-5?

molereddy · 2025-03-20T17:52:22Z

Can get back later with a full exhaustive list, but the starter models are definitely different, having been re-trained from the base models with different experimental setups. We've also found that how you use distributed training affects the results, similar to this issue About the deepspeed #36. Other places which are modified include batch data collation, data pre-processing e.g. in the function preprocess_pretraining_instance (but these changes should not affect results afaik). Will confirm exact details in a while.
It's not ideal, but currently we are keeping hyperparameters the same in results.md across methods for simplicity of documentation. So please update the results documentation with those specific hyperparameters.
We hope to soon get to making this documentation cleaner. For now you may still specify your desired hyperparameters in your trainer config file (but commented out) and we can uncomment them in the future.

Atry · 2025-03-20T18:01:15Z

But, 1e-5 learning rate / 10 epochs are way too high for CE-U, always resulting in 0.0 model utility. Do you really want me to add them to results.md?

Also I found 4e-5 learning rate / 5 epochs can produce reasonable results with https://github.com/locuslab/tofu, but not with this repository. Are there extra gradient clipping or regularization applied in https://github.com/locuslab/tofu?

molereddy · 2025-03-20T18:10:01Z

I see the issue. Let's not add that to results.md, then. We definitely have to change the way results are reported. For now, save your results with your hyperparams so that they can be added later.

Atry · 2025-03-20T18:20:33Z

2e-6 learning rate / 9 epochs can produce reasonable results with this repository but still different from https://github.com/locuslab/tofu . I really need to figure out why.

2e-6 learning rate / 9 epochs against `open-unlearning/tofu_Llama-2-7b-chat-hf_full` model forget 5% with training scripts from this repository:

{
    "forget_Q_A_PARA_Prob": 0.021767907157372975,
    "forget_Q_A_PERT_Prob": 0.018082396532106942,
    "forget_Q_A_Prob": 0.03332012313745508,
    "forget_Q_A_ROUGE": 0.10623593093943053,
    "forget_quality": 0.0220923622208589,
    "forget_truth_ratio": 0.7617224305194077,
    "model_utility": 0.4612301673662931,
    "ra_Q_A_PERT_Prob": 0.059219124874410535,
    "ra_Q_A_Prob": 0.22738571502268315,
    "ra_Q_A_Prob_normalised": 0.5607812059531567,
    "ra_Q_A_ROUGE": 0.741,
    "ra_Truth_Ratio": 0.7369656421149354,
    "retain_Q_A_PARA_Prob": 0.0876370457559824,
    "retain_Q_A_PERT_Prob": 0.051889002596712086,
    "retain_Q_A_Prob": 0.21705962190404535,
    "retain_Q_A_ROUGE": 0.313057008174422,
    "retain_Truth_Ratio": 0.37752008027804096,
    "wf_Q_A_PERT_Prob": 0.04059571723577744,
    "wf_Q_A_Prob": 0.12972804916521105,
    "wf_Q_A_Prob_normalised": 0.5205829499369004,
    "wf_Q_A_ROUGE": 0.8468660968660968,
    "wf_Truth_Ratio": 0.679617672811441
}

2e-6 learning rate / 9 epochs against `locuslab/tofu_ft_llama2-7b` model forget 5% with training scripts from https://github.com/locuslab/tofu:

ROUGE Real Authors,Prob. Real Authors,Truth Ratio Real Authors,ROUGE Real World,Prob. Real World,Truth Ratio Real World,ROUGE Retain,Prob. Retain,Truth Ratio Retain,ROUGE Forget,Prob. Forget,Truth Ratio Forget,Model Utility,Forget Quality,Method,Submitted By
0.8746666666666667,0.5717868595591832,0.7451146866249074,0.8675213675213675,0.540686736479648,0.6989915599479447,0.5602712624945209,0.5270367434678659,0.4211915064832616,0.23312532598788058,0.1334090231075485,0.7307127435592052,0.6112643587617107,0.0020827633834865906,ceu_ignore_first_token,Bo Yang

You can see the model utility is worse and forget quality is better with training scripts from this repository.
Do you have any clue about why?

Atry · 2025-03-20T20:03:08Z

I think the results difference is partially due to different fine-tuned models:

I tried to run CE-U with hyperparameter 2e-6 learning rate / 9 epochs against locuslab/tofu_ft_llama2-7b model forget 5% with training scripts from this repository:

{
    "forget_Q_A_PARA_Prob": 0.04816086617065594,
    "forget_Q_A_PERT_Prob": 0.03823621151647967,
    "forget_Q_A_Prob": 0.08244071683104266,
    "forget_Q_A_ROUGE": 0.20805565648330746,
    "forget_quality": 0.016258459276759563,
    "forget_truth_ratio": 0.7585021939583438,
    "model_utility": 0.5686233127601279,
    "ra_Q_A_PERT_Prob": 0.05605135237565264,
    "ra_Q_A_Prob": 0.21465927125886083,
    "ra_Q_A_Prob_normalised": 0.5623984756305082,
    "ra_Q_A_ROUGE": 0.8483333333333333,
    "ra_Truth_Ratio": 0.734670350084031,
    "retain_Q_A_PARA_Prob": 0.150980517314747,
    "retain_Q_A_PERT_Prob": 0.0847499375215266,
    "retain_Q_A_Prob": 0.39507232803851366,
    "retain_Q_A_ROUGE": 0.5139141246249014,
    "retain_Truth_Ratio": 0.40373489258576045,
    "wf_Q_A_PERT_Prob": 0.03368778126538988,
    "wf_Q_A_Prob": 0.10864872443808131,
    "wf_Q_A_Prob_normalised": 0.5228066567884767,
    "wf_Q_A_ROUGE": 0.8532763532763532,
    "wf_Truth_Ratio": 0.679784481414584
}

In short:

open-unlearning/tofu_Llama-2-7b-chat-hf_full model + locuslab/open-unlearning code:
- model utility: 0.4612301673662931
- forget quality: 0.0220923622208589
locuslab/tofu_ft_llama2-7b model + locuslab/open-unlearning code:
- model utility: 0.5686233127601279
- forget quality: 0.016258459276759563
locuslab/tofu_ft_llama2-7b model + locuslab/tofu code:
- model utility: 0.6112643587617107
- forget quality: 0.0020827633834865906

Atry · 2025-03-20T20:04:25Z

Any way, the benchmark here and locuslab/tofu are not comparable. They should be considered different benchmark.

Dornavineeth · 2025-03-27T20:38:39Z

Hi @Atry

Thank you for taking the time to add your method to our repository. We've added guidelines on contributing to our repository, including how to share your settings, results, and reproducible scripts. Could you review them and update this PR accordingly?

This should help others use and reproduce your work in their studies.

Quick summary of the contribution guide:

Create a folder under community/methods/.
Include your method details, results, hyperparameter search scripts, and reproducibility steps.
Update the leaderboard with your results.

We've provided a template folder to get you started.

Let us know if you have any questions!

molereddy · 2025-03-28T06:00:29Z

@Atry we agree that the numbers in these versions of TOFU are not directly comparable. But you can tune the hyperparameters again to get a new setting for your best results.

You can now report those best results on our leaderboard as @Dornavineeth mentioned, instead of just the default repro parameters as before. This includes changing the number of epochs, early stoppage, etc. as you mentioned.

Please merge the latest changes into your branch, make the above-mentioned updates and we'll get this in ASAP!

molereddy · 2025-04-13T16:21:08Z

Following up here: it would be great if you could integrate the latest changes, document your contributions and add results.

I understand that for results you have to re-run experiments and tune things again, as results are not comparable. If that is not possible, make the updates and test the code to report on which models/datasets it is working and we can get it in without the results.

Atry · 2025-04-14T02:00:31Z

Sure! Will do. But maybe not next week.

…

On Sun, Apr 13, 2025 at 9:21 AM Anmol Mekala ***@***.***> wrote: Following up here: it would be great if you could integrate the latest changes, document your contributions and add results. I understand that for results you have to re-run experiments and tune things again, as results are not comparable. If that is not possible, make the updates and test the code to report on which models/datasets it is working and we can get it in without the results. — Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAES3OQHSGK54QWJNDK76TT2ZKFIVAVCNFSM6AAAAABZEIOCDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBQGAYTMOBZGU> . You are receiving this because you were mentioned.Message ID: ***@***.***> *molereddy* left a comment (locuslab/open-unlearning#72) <#72 (comment)> Following up here: it would be great if you could integrate the latest changes, document your contributions and add results. I understand that for results you have to re-run experiments and tune things again, as results are not comparable. If that is not possible, make the updates and test the code to report on which models/datasets it is working and we can get it in without the results. — Reply to this email directly, view it on GitHub <#72 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAES3OQHSGK54QWJNDK76TT2ZKFIVAVCNFSM6AAAAABZEIOCDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBQGAYTMOBZGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Atry force-pushed the ce-u branch from b9194a4 to 9542030 Compare March 17, 2025 01:12

feat: add CE-U loss

b4c2358

Atry force-pushed the ce-u branch from 9542030 to b4c2358 Compare March 17, 2025 07:15

Atry marked this pull request as ready for review March 17, 2025 07:39

chore: move CE-U related functions to ceu.py

c32e5f4

molereddy mentioned this pull request Apr 1, 2025

Reproduction issues on LLaMA3.1 (TOFU): checkpoint vs final‑model evaluation + epoch step‑count mismatch #88

Closed

4 tasks

Atry temporarily deployed to tests April 13, 2025 16:12 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add CE-U loss #72

feat: add CE-U loss #72

Atry commented Mar 17, 2025 •

edited

Loading

molereddy commented Mar 17, 2025

Atry commented Mar 19, 2025

molereddy commented Mar 20, 2025

molereddy commented Mar 20, 2025

Atry commented Mar 20, 2025 via email •

edited

Loading

molereddy commented Mar 20, 2025

Atry commented Mar 20, 2025

molereddy commented Mar 20, 2025

Atry commented Mar 20, 2025 •

edited

Loading

molereddy commented Mar 20, 2025

Atry commented Mar 20, 2025 •

edited

Loading

Atry commented Mar 20, 2025

Atry commented Mar 20, 2025

Dornavineeth commented Mar 27, 2025

molereddy commented Mar 28, 2025 •

edited

Loading

molereddy commented Apr 13, 2025

Atry commented Apr 14, 2025 via email

feat: add CE-U loss #72

Are you sure you want to change the base?

feat: add CE-U loss #72

Conversation

Atry commented Mar 17, 2025 • edited Loading

What does this PR do?

Before submitting

molereddy commented Mar 17, 2025

Atry commented Mar 19, 2025

molereddy commented Mar 20, 2025

molereddy commented Mar 20, 2025

Atry commented Mar 20, 2025 via email • edited Loading

molereddy commented Mar 20, 2025

Atry commented Mar 20, 2025

molereddy commented Mar 20, 2025

Atry commented Mar 20, 2025 • edited Loading

molereddy commented Mar 20, 2025

Atry commented Mar 20, 2025 • edited Loading

2e-6 learning rate / 9 epochs against open-unlearning/tofu_Llama-2-7b-chat-hf_full model forget 5% with training scripts from this repository:

2e-6 learning rate / 9 epochs against locuslab/tofu_ft_llama2-7b model forget 5% with training scripts from https://github.com/locuslab/tofu:

Atry commented Mar 20, 2025

Atry commented Mar 20, 2025

Dornavineeth commented Mar 27, 2025

molereddy commented Mar 28, 2025 • edited Loading

molereddy commented Apr 13, 2025

Atry commented Apr 14, 2025 via email

Atry commented Mar 17, 2025 •

edited

Loading

Atry commented Mar 20, 2025 via email •

edited

Loading

Atry commented Mar 20, 2025 •

edited

Loading

Atry commented Mar 20, 2025 •

edited

Loading

2e-6 learning rate / 9 epochs against `open-unlearning/tofu_Llama-2-7b-chat-hf_full` model forget 5% with training scripts from this repository:

2e-6 learning rate / 9 epochs against `locuslab/tofu_ft_llama2-7b` model forget 5% with training scripts from https://github.com/locuslab/tofu:

molereddy commented Mar 28, 2025 •

edited

Loading