Skip to content

Commit dcdfdc5

Browse files
authored
[GPU] Minor change (Added comment for kv cache prealloc policy) (#25529)
### Details: - Added detailed description about the kv cache prealloc policy ### Tickets: - *ticket-id*
1 parent ce32bfe commit dcdfdc5

File tree

2 files changed

+9
-1
lines changed

2 files changed

+9
-1
lines changed

src/plugins/intel_gpu/src/graph/include/kv_cache_inst.h

-1
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@ class typed_primitive_inst<kv_cache> : public typed_primitive_inst_base<kv_cache
5252

5353
static std::string to_string(const kv_cache_node& node);
5454

55-
// Distribute prealloc period to prevent memory peak
5655
int32_t get_prealloc_iter_num() override;
5756

5857
static void update_pad(layout& l, int64_t pad, int64_t sequence_axis_legacy) {

src/plugins/intel_gpu/src/graph/kv_cache.cpp

+9
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,15 @@ std::string kv_cache_inst::to_string(const kv_cache_node& node) {
7070
}
7171

7272
int32_t kv_cache_inst::get_prealloc_iter_num() {
73+
// - When a kv_cache_inst runs out of the pre-allocated memory and requires additional memory,
74+
// it allocate a new memory. And then it copies data in the original memory to the new memory.
75+
// Since the original memory is still assigned to the ReadValue, even after the copying is finished,
76+
// we will have 2x memories for the kv cache. And the original memory will be released when the ReadValue is
77+
// called, i.e., at the next iteration.
78+
// - If this alloc/copy happens at the same time for all the kv cache memory, there will be a memory peak at that
79+
// iteration.
80+
// - Therfore, to avoid this situation where the allocation and copying occurs simutaneously for all the kv_cache_insts,
81+
// we assigned different prealloc-size for each kv cache so that we could prevent a memory peak
7382
return 128 + kv_cache_id % 64;
7483
}
7584

0 commit comments

Comments
 (0)