Saving state
#5440
-
Hi!
Can it be converted to something like this to reduce time to get V cache?
|
Beta Was this translation helpful? Give feedback.
Answered by
ggerganov
Feb 11, 2024
Replies: 1 comment
-
Yes, it's also more correct since It will use more data though since we will be storing the full cache regardless if it has been used or not. There should be some logic to skip empty cells. Overall, saving the state could be improved |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
HarewVlad
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yes, it's also more correct since
kv_head
does not necessarily correspond to the size of the KV cache. It works formain
since the generation is (almost) always single-sequence.It will use more data though since we will be storing the full cache regardless if it has been used or not. There should be some logic to skip empty cells. Overall, saving the state could be improved