-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hi, I'm confused with the pooling strategy you used here.
For training, you use the avg token
Line 52 in 9da9269
--pooling_strategy avg \ |
While for evaluation, you are not specifing any pooling flag here,
Lines 99 to 105 in 9da9269
2) evaluate on STS benchmark | |
```bash | |
BiLLM_START_INDEX=31 CUDA_VISIBLE_DEVICES=0 python eval_sts.py \ | |
--model_name_or_path NousResearch/Llama-2-7b-hf \ | |
--lora_name_or_path SeanLee97/bellm-llama-7b-nli \ | |
--apply_bfloat16 0 | |
``` |
so this should be default value [cls], right?
Line 57 in 9da9269
parser.add_argument("--pooling_strategy", type=str, default='cls') |
As for the paper, you mentioned that you used the representative word as the pivot, so this should be the last non-padding token, right? So I'm wondering which token should I use or does it make no difference in a decoder based model like llama?
Metadata
Metadata
Assignees
Labels
No labels