How can we effectively handle embeddings from a long sequence? #57

shiyegao · 2025-03-02T09:14:26Z

shiyegao
Mar 2, 2025

I noticed that the paper averages the token embeddings, transforming (seq_len, embed_dim) into (embed_dim).

Would this operation discard positional information from the tokens?

What would be a better approach to fully utilize the (seq_len, embed_dim) representation?

garykbrixi · 2025-03-02T20:42:52Z

Yes using the full (seq_len, embed_dim) embeddng should provide more information

0 replies