Replies: 1 comment
-
Yes using the full (seq_len, embed_dim) embeddng should provide more information |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I noticed that the paper averages the token embeddings, transforming (seq_len, embed_dim) into (embed_dim).
Would this operation discard positional information from the tokens?
What would be a better approach to fully utilize the (seq_len, embed_dim) representation?
Beta Was this translation helpful? Give feedback.
All reactions