Are there any KV cache movement when using PP/TP? #13653

leizhenyuan · 2025-02-21T07:21:35Z

leizhenyuan
Feb 21, 2025

I am not familiar with distributed. Below is my under standing:
For Pipeline parallel, we only need to transfer intermediate tensor between different node, so there are no KV cache communication.
For Tensor Parallel, different heads are allocated on different device, each head only need to calculate own output, there is also no KV cache communication.

But i also found need_recv_kvcache when worker is a driver worker, so i think my above understanding must be wrong, Can anyone answer this question? Thanks.

comaniac · 2025-02-21T16:57:50Z

comaniac
Feb 21, 2025
Collaborator

Your understanding is correct. The API you found is used for prefill disaggregation. In this case we transfer the kv cache from a prefill worker to a decode worker.

1 reply

leizhenyuan Feb 24, 2025
Author

Got it Thanks for you reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there any KV cache movement when using PP/TP? #13653

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Are there any KV cache movement when using PP/TP? #13653

leizhenyuan Feb 21, 2025

Replies: 1 comment · 1 reply

comaniac Feb 21, 2025 Collaborator

leizhenyuan Feb 24, 2025 Author

leizhenyuan
Feb 21, 2025

Replies: 1 comment 1 reply

comaniac
Feb 21, 2025
Collaborator

leizhenyuan Feb 24, 2025
Author