You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all,
I am trying to understand what version of Flash Attention llama.cpp is using. I am asking because in /workspace/llama.cpp/tests/test-backend-ops.cpp, I see:
So it looks to me that we can have nb != kv. My understanding of the algorithm is that both Q, K and V were R^(nxd) matrices. Could you point me to the version of the algorithm you are using?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all,
I am trying to understand what version of Flash Attention llama.cpp is using. I am asking because in
/workspace/llama.cpp/tests/test-backend-ops.cpp
, I see:So it looks to me that we can have
nb != kv
. My understanding of the algorithm is that both Q, K and V wereR^(nxd)
matrices. Could you point me to the version of the algorithm you are using?Thanks,
Giuseppe
Beta Was this translation helpful? Give feedback.
All reactions