why three special token is ignored in label #53

Pride-Huang · 2024-11-27T09:47:47Z

The label just consists of image token, with the special token <|image start|> ignored. Why compute sft loss like this?

Masaaki-75 · 2024-12-10T09:36:06Z

This might partly answer the question: https://github.com/baaivision/Emu3/blob/main/emu3/mllm/processing_emu3.py#L178-L183

During inference, the boi_token (which is "<|image start|>" in string form), the resolution information, and the img_token (which is "<|image token|>") are directly provided as a starter for the generation. (BTW, the mismatch between the string form and the variable naming is confusing and annoying, lol)

That said, i am also curious why the authors limited the supervision to the first visual token id and the last visual token id, while ignoring eol_token, eof_token, and eoi_token.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why three special token is ignored in label #53

why three special token is ignored in label #53

Pride-Huang commented Nov 27, 2024

Masaaki-75 commented Dec 10, 2024

why three special token is ignored in label #53

why three special token is ignored in label #53

Comments

Pride-Huang commented Nov 27, 2024

Masaaki-75 commented Dec 10, 2024