Skip to content

confused with the pooling strategy? #5

@rxqy

Description

@rxqy

Hi, I'm confused with the pooling strategy you used here.

For training, you use the avg token

--pooling_strategy avg \

While for evaluation, you are not specifing any pooling flag here,

BeLLM/README.md

Lines 99 to 105 in 9da9269

2) evaluate on STS benchmark
```bash
BiLLM_START_INDEX=31 CUDA_VISIBLE_DEVICES=0 python eval_sts.py \
--model_name_or_path NousResearch/Llama-2-7b-hf \
--lora_name_or_path SeanLee97/bellm-llama-7b-nli \
--apply_bfloat16 0
```

so this should be default value [cls], right?
parser.add_argument("--pooling_strategy", type=str, default='cls')

As for the paper, you mentioned that you used the representative word as the pivot, so this should be the last non-padding token, right? So I'm wondering which token should I use or does it make no difference in a decoder based model like llama?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions