You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Snippets] Support Brgemm with transposed_b via BrgemmCopyB (#24932)
### Details:
- *Support FP32/BF16/I8 matmuls with transpose_b=true via BrgemmCopyB*
- *BrgemmCopyB emitter: handle tail iteration by N before the main body*
- *Remove workaround on LDB and N dim rounding in brgemm emitters and
related buffers*
### Tickets:
- *CVS-114487*
## TODO:
- [ ] BufferAllocation test for FP32 brgemm with repacking
- [ ] SetBrgemmCopyBBuffersShape tests
- [ ] MHA with transpose B for low precisions (FP32 already exists)
- [ ] FuseTransposeBrgemm tests
Copy file name to clipboardexpand all lines: src/common/snippets/docs/mha_optimization_guide.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -123,7 +123,7 @@ For enhancing the execution efficiency, blocking across the M, K, and N matmul d
123
123
124
124
### Blocking Parameters
125
125
126
-
The heuristics for determining the optimal block sizes can be found in [SetBrgemmCPUBlockingParams](../../../plugins/intel_cpu/src/transformations/snippets/x64/pass/set_brgemm_cpu_blocking_params.cpp).
126
+
The heuristics for determining the optimal block sizes can be found in [BrgemmCPUBlocking](../../../plugins/intel_cpu/src/transformations/snippets/x64/pass/lowered/brgemm_cpu_blocking.cpp).
127
127
128
128
**Please note: Blocking by M dimension is shared between both Brgemms. Please see [SplitLoops](../include/snippets/lowered/pass/split_loops.hpp) lowered pass for the details.**
129
129
@@ -141,7 +141,7 @@ Based on previously discussed information, we provide the following recommendati
141
141
In local experiments, some transformations might be worth to change:
142
142
- Disable [ExtractUnsupportedTransposes](#extractunsupportedtransposes) transformation in order to benchmark Snippets Transpose implementation.
143
143
- Adjust [SplitDimensionM](#splitdimensionm) heuristics in order to benchmark another splitting, or disable the pass at all.
144
-
3.[Blocking parameters](#blocking-parameters): adjust blocking heuristics in `SetBrgemmCPUBlockingParams`.
144
+
3.[Blocking parameters](#blocking-parameters): adjust blocking heuristics in `BrgemmCPUBlocking`.
145
145
- Please note that there are 2 Matmul nodes inside a single MHA, and each Matmul can have his own optimal K, N blocking params.
146
146
M block is better to keep the same since the corresponding blocking loop is shared between both Matmuls.
147
147
- For the BF16/INT8 blocking loops, 2 options are possible: blocking can be done only for Brgemm node, or for BrgemmCopyB repacking too.
0 commit comments