[WIP][GPU] Expand matmul decomp cases #2916

kealan-barbieri · 2025-03-18T22:34:32Z

Description

Expands cases where inputs are decompressed to include integer activations as well as weights. Integer weights are still required but in the presence of fpmath setting activations will also be upconverted.

This means that cases intending to use integer accumulation must not supply eg attr-fpmath=f16:true as all such cases must be upconverted per https://jira.devtools.intel.com/browse/MFDNN-13380.

Fixes # (github issue)

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

rjoursler · 2025-03-19T20:09:26Z

src/gpu/intel/jit/gemm/gen_gemm_kernel.cpp

+    });
+
+    add_mode_matches(fpmath_bf16, [](Type dt) -> const char * {
+        if (dt.isInt8() || dt.isInt4()) return "B";


This is invalid, it will just lead to out-of-register issues like fixed in #2467 as kernels optimized for B may not have enough spare registers for upconversion. If some of our kernel strategies on B are functional for [OB], then they should just be marked as such.

Additionally, we should not need multiple calls to add_mode_matches, instead, we should get

add_mode_matches(fpmath_bf16, [](Type dt) -> const char * { if (dt == Type::f32) { return "[SB]"; } if (dt.isInt8() || dt.isInt4()) return "[OB]"; if (dt.isF8()) return "B"; // This seems invalid and could could lead to out of registers, Need to determine appropriate specifier for GEMM. }

I see what you mean, but we need to strike a balance between only supporting specifically optimized cases and out of box functionality. There is an issue with the current approach - there is no fallback from "[OB]" type strategies to "[B]" type due to the selection process.

I suggest we take the opposite approach and mark strategies that will not tolerate upconversion. Optimized strategies with "[OB]" tags should be prefered on a performance basis when theyre useable.

I suggest we take the opposite approach and mark strategies that will not tolerate up conversion.

This is what we are doing already, it is just that B is how we mark that the kernel will not support up conversion. 😉

Refactored, renamed all "O", "H", "S" strategies to `"[OH]", "H", "S"' to conform with naming convention.

…hpad size

… on it Case with different mask is not supported if only both scales were specified.

Temporary "const char *" objects can disappear while getting to the parser internals. Moving strings to parse into a permanent container solves the problem.

kealan-barbieri · 2025-03-24T22:51:11Z

make test
disable device_cpu
enable device_gpu
disable benchdnn_all
enable benchdnn_matmul

kealan-barbieri requested a review from a team as a code owner March 18, 2025 22:34

github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Mar 18, 2025

kealan-barbieri force-pushed the kealanba/gemm_fpmath_fix branch from 2f3e014 to e3a9503 Compare March 19, 2025 00:29

kealan-barbieri requested review from a team as code owners March 19, 2025 00:29

github-actions bot added platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 component:tests Codeowner: @oneapi-src/onednn-arch labels Mar 19, 2025

kealan-barbieri changed the base branch from main to dzarukin/allow_int4_odd March 19, 2025 00:29

dzarukin force-pushed the dzarukin/allow_int4_odd branch 3 times, most recently from b34fab9 to 2e2c013 Compare March 19, 2025 19:43

rjoursler reviewed Mar 19, 2025

View reviewed changes

dzarukin force-pushed the dzarukin/allow_int4_odd branch 4 times, most recently from f4d86c8 to 8e04316 Compare March 20, 2025 04:43

dzarukin added 2 commits March 20, 2025 18:09

benchdnn: matmul: adjust int4 invalid cases

db0deb3

common: matmul: adjust check for int4 tensors w.r.t. strides

4e9595d

kealan-barbieri force-pushed the kealanba/gemm_fpmath_fix branch from e3a9503 to 4375e7a Compare March 21, 2025 16:47

github-actions bot removed platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 component:tests Codeowner: @oneapi-src/onednn-arch labels Mar 21, 2025

kealan-barbieri force-pushed the kealanba/gemm_fpmath_fix branch from 4375e7a to 6892eb5 Compare March 21, 2025 17:10

dzarukin added 8 commits March 21, 2025 11:26

cpu: x64: jit_uni_dw_utils: improve verbose

d53b92d

fixup: cpu: x64: jit_avx512_core_x8s8s32x_conv_kernel: restore scratc…

36b50bc

…hpad size

benchdnn: reorder: update filling for s32->f32 cases with zero-points

1470447

benchdnn: matmul: adjust filling to incorporate zero-points

84c498f

benchdnn: reorder: update skip conditions and add messages

34570cf

benchdnn: reorder: correct the mask bit check

f3c6ad1

fixup: src: introduce quant_entry_t and refactor arg_scales_t to rely…

e51ded6

… on it Case with different mask is not supported if only both scales were specified.

cpu: x64: jit_reorder: add verbose messages

52edf0d

dzarukin added 2 commits March 21, 2025 11:27

benchdnn: self: replace temporary "const char *" with "std::string"

87661bc

Temporary "const char *" objects can disappear while getting to the parser internals. Moving strings to parse into a permanent container solves the problem.

cpu: x64: uni_ncsp: fix memory leak

608baa3

dzarukin force-pushed the dzarukin/allow_int4_odd branch from 8e04316 to 608baa3 Compare March 21, 2025 18:27

xe: jit: gemm: expand decomp cases, enforce fpmath

ebf1fa9

kealan-barbieri force-pushed the kealanba/gemm_fpmath_fix branch from 6892eb5 to ebf1fa9 Compare March 24, 2025 16:45

vpirogov force-pushed the dzarukin/allow_int4_odd branch from 608baa3 to f933c1e Compare March 24, 2025 17:50

Base automatically changed from dzarukin/allow_int4_odd to main March 24, 2025 19:24

kealan-barbieri changed the title ~~[GPU] Expand matmul decomp cases~~ [WIP][GPU] Expand matmul decomp cases Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][GPU] Expand matmul decomp cases #2916

[WIP][GPU] Expand matmul decomp cases #2916

kealan-barbieri commented Mar 18, 2025 •

edited

Loading

rjoursler Mar 19, 2025 •

edited

Loading

kealan-barbieri Mar 19, 2025

rjoursler Mar 20, 2025

kealan-barbieri Mar 21, 2025

kealan-barbieri commented Mar 24, 2025

[WIP][GPU] Expand matmul decomp cases #2916

Are you sure you want to change the base?

[WIP][GPU] Expand matmul decomp cases #2916

Conversation

kealan-barbieri commented Mar 18, 2025 • edited Loading

Description

Checklist

General

rjoursler Mar 19, 2025 • edited Loading

Choose a reason for hiding this comment

kealan-barbieri Mar 19, 2025

Choose a reason for hiding this comment

rjoursler Mar 20, 2025

Choose a reason for hiding this comment

kealan-barbieri Mar 21, 2025

Choose a reason for hiding this comment

kealan-barbieri commented Mar 24, 2025

kealan-barbieri commented Mar 18, 2025 •

edited

Loading

rjoursler Mar 19, 2025 •

edited

Loading