Skip to content

Commit a87851d

Browse files
authored
[Snippets][CPU] Disabled dynamic MHA tokenization if rtCache is not used (openvinotoolkit#26376)
### Details: - *To reduce overheads of ShapeInference and CodeGeneration of dynamic Subgraphs, CPU node Subgraph uses Runtime Cache of the plugin. If Runtime Cache capacity is zero, dynamic subgraphs should not be tokenized - it will lead to performance degradations. This PR disables dynamic MHA tokenization if `config.rtCacheCapacity == 0`* ### Tickets: - *150951*
1 parent 00e5635 commit a87851d

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp

+7-2
Original file line numberDiff line numberDiff line change
@@ -896,16 +896,21 @@ void Transformations::MainSnippets(void) {
896896
size_t concurrency = config.streamExecutorConfig.get_threads_per_stream();
897897
if (concurrency == 0)
898898
concurrency = parallel_get_max_threads();
899+
900+
// Runtime caching should be enabled in case of dynamic Subgraphs in CPU Plugin: to reduce overheads of ShapeInference and CodeGeneration
901+
// If runtime cache capacity is zero, it means that rtCache won't be used and
902+
// we shouldn't tokenize dynamic Subgraphs - it will lead to performance degradations
903+
bool is_dynamic_mha_token_enabled = config.rtCacheCapacity != 0;
899904
#if defined(OPENVINO_ARCH_ARM64)
900905
// ARM has 32 gprs. After excluding 2 registers for work amounts, 1 register for runtime parameters, 1 platform register,
901906
// 3 registers for temporary use, and 2 stack related registers, it has 23 remaining registers.
902907
size_t data_ptr_gpr_count = 23;
903-
bool is_dynamic_mha_token_enabled = false;
908+
// ARM doesn't even support MHA yet
909+
is_dynamic_mha_token_enabled = false;
904910
#else
905911
// X64 has 16 gprs. After excluding 2 registers for work amounts, 1 register for runtime parameters,
906912
// and 2 stack related registers, it has 11 remaining registers.
907913
size_t data_ptr_gpr_count = 11;
908-
bool is_dynamic_mha_token_enabled = true;
909914
#endif
910915
// The optimization "SplitDimensionM" depends on target machine (thread count).
911916
// To avoid uncontrolled behavior in tests, we disabled the optimization when there is Config::SnippetsMode::IgnoreCallback

0 commit comments

Comments
 (0)