Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][GPU] Expand matmul decomp cases #2916

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

kealan-barbieri
Copy link
Contributor

@kealan-barbieri kealan-barbieri commented Mar 18, 2025

Description

Expands cases where inputs are decompressed to include integer activations as well as weights. Integer weights are still required but in the presence of fpmath setting activations will also be upconverted.

This means that cases intending to use integer accumulation must not supply eg attr-fpmath=f16:true as all such cases must be upconverted per https://jira.devtools.intel.com/browse/MFDNN-13380.

Fixes # (github issue)

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

@kealan-barbieri kealan-barbieri requested a review from a team as a code owner March 18, 2025 22:34
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Mar 18, 2025
@kealan-barbieri kealan-barbieri force-pushed the kealanba/gemm_fpmath_fix branch from 2f3e014 to e3a9503 Compare March 19, 2025 00:29
@kealan-barbieri kealan-barbieri requested review from a team as code owners March 19, 2025 00:29
@github-actions github-actions bot added platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 component:tests Codeowner: @oneapi-src/onednn-arch labels Mar 19, 2025
@kealan-barbieri kealan-barbieri changed the base branch from main to dzarukin/allow_int4_odd March 19, 2025 00:29
@dzarukin dzarukin force-pushed the dzarukin/allow_int4_odd branch 3 times, most recently from b34fab9 to 2e2c013 Compare March 19, 2025 19:43
});

add_mode_matches(fpmath_bf16, [](Type dt) -> const char * {
if (dt.isInt8() || dt.isInt4()) return "B";
Copy link
Contributor

@rjoursler rjoursler Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid, it will just lead to out-of-register issues like fixed in #2467 as kernels optimized for B may not have enough spare registers for upconversion. If some of our kernel strategies on B are functional for [OB], then they should just be marked as such.

Additionally, we should not need multiple calls to add_mode_matches, instead, we should get

add_mode_matches(fpmath_bf16, [](Type dt) -> const char * {
        if (dt == Type::f32) { return "[SB]"; }
        if (dt.isInt8() || dt.isInt4()) return "[OB]";
        if (dt.isF8()) return "B"; // This seems invalid and could could lead to out of registers, Need to determine appropriate specifier for GEMM.
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but we need to strike a balance between only supporting specifically optimized cases and out of box functionality. There is an issue with the current approach - there is no fallback from "[OB]" type strategies to "[B]" type due to the selection process.

I suggest we take the opposite approach and mark strategies that will not tolerate upconversion. Optimized strategies with "[OB]" tags should be prefered on a performance basis when theyre useable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we take the opposite approach and mark strategies that will not tolerate up conversion.

This is what we are doing already, it is just that B is how we mark that the kernel will not support up conversion. 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored, renamed all "O", "H", "S" strategies to `"[OH]", "H", "S"' to conform with naming convention.

@dzarukin dzarukin force-pushed the dzarukin/allow_int4_odd branch 4 times, most recently from f4d86c8 to 8e04316 Compare March 20, 2025 04:43
@kealan-barbieri kealan-barbieri force-pushed the kealanba/gemm_fpmath_fix branch from e3a9503 to 4375e7a Compare March 21, 2025 16:47
@github-actions github-actions bot removed platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 component:tests Codeowner: @oneapi-src/onednn-arch labels Mar 21, 2025
@kealan-barbieri kealan-barbieri force-pushed the kealanba/gemm_fpmath_fix branch from 4375e7a to 6892eb5 Compare March 21, 2025 17:10
Temporary "const char *" objects can disappear while getting to the
parser internals. Moving strings to parse into a permanent container
solves the problem.
@dzarukin dzarukin force-pushed the dzarukin/allow_int4_odd branch from 8e04316 to 608baa3 Compare March 21, 2025 18:27
@kealan-barbieri kealan-barbieri force-pushed the kealanba/gemm_fpmath_fix branch from 6892eb5 to ebf1fa9 Compare March 24, 2025 16:45
@vpirogov vpirogov force-pushed the dzarukin/allow_int4_odd branch from 608baa3 to f933c1e Compare March 24, 2025 17:50
Base automatically changed from dzarukin/allow_int4_odd to main March 24, 2025 19:24
@kealan-barbieri
Copy link
Contributor Author

make test
disable device_cpu
enable device_gpu
disable benchdnn_all
enable benchdnn_matmul

@kealan-barbieri kealan-barbieri changed the title [GPU] Expand matmul decomp cases [WIP][GPU] Expand matmul decomp cases Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants