-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64: matmul: addition of JIT int8 kernel #2686
Conversation
1090335
to
217d249
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. I haven't looked in detail at the JIT code itself, I first want to understand the API and motivation. Also, do you have any performance numbers and related benchdnn arguments?
cmake/options.cmake
Outdated
@@ -1,23 +1,23 @@ | |||
#=============================================================================== | |||
# Copyright 2018-2025 Intel Corporation | |||
#== == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == == = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you revert these whitespace changes please? It could be that your IDE auto-formatted them without you realizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't notice this. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you revert these whitespace changes please? It could be that your IDE auto-formatted them without you realizing.
Sure, I will share the performance results shortly.
217d249
to
114c53e
Compare
Please find the performance numbers(f32 vs int8) below taken on a 32core Graviton 3E machine at benchdnn level: <style> </style>
These shapes were taken from some of the widely used LLM models. The f32 results were obtained using the command: for int8: Other arguments like scale and zp can be give to int8 kernel as example shown below: |
114c53e
to
3657daf
Compare
3657daf
to
9dd3b21
Compare
Tagging @dmitry-gorokhov for OpenVINO |
e78e29f
to
8849077
Compare
The code is very raw, I have high level of confidence it would fail on lots of benchdnn cases. |
HI @dzarukin, I will look into these. But there are no failures on benchdnn test cases. It was verified before raising the PR. |
Sorry, that was a quite premature statement. Let me put it this way:
|
Thanks for the clarification @dzarukin . |
844fa49
to
139b4b2
Compare
@dzarukin Thanks for the extensive review. Your valuable comments and suggestions are highly appreciated. I have made all the suggested changes, request you to please have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last big piece I have is header dependencies for the list file, once it's addressed, it's good to go from my side. Thank you.
e77d3db
to
9344e0f
Compare
I have just rebased and force pushed the commit, but the clang-tidy is failing with the error:
Any idea why this is happening? |
Can the fork be a reason? I'm not totally sure if the linter is designed to work with forks. |
Even I think there is a linter issue, I checked the tracking info on my local repo, aarch64_matmul_int8 is tracking the correct branch in upstream. |
Linter is currently broken for forks. Will be fixed by #2859. It should not block the PR though. |
I've just merged the fix, try rebasing again. |
9344e0f
to
118103c
Compare
Thanks! I have rebased, it seems to be working fine now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've addressed all my comments, thank you! I do have one (non-blocking) question though: is this vector length agnostic? Not that it needs to be, but I can't see any reference to anything vector length specific
@jondea , thanks for the approval, right now its written only for GR3 cpus, but can be easily extended to other vector lengths in future by adjusting the block sizes. |
Hi @vpirogov, is there anything to be done from our side which is blocking the merge of this PR? |
That's no problem, the reason I asked is that I couldn't see any dispatch logic to protect other ISAs, I had missed this line because I was searching for 128/256/512
But that looks okay, thanks! |
@Shreyas-fuj, the PR is good to go. Thank you for the contribution! |
@Shreyas-fuj, There're a couple comments from @dzarukin that should be addressed before promotion: |
Hi @vpirogov , the second comment was not insisted. I had missed the first comment, thanks. |
118103c
to
7859752
Compare
@vpirogov , I have made the EOL changes suggested by you. Thanks. |
7859752
to
d54c72a
Compare
Hi @vpirogov, I have addressed all the comments. Thanks! |
Description
This PR introduces the quantised(int8) matmul kernel in OneDNN which gives good speed-ups for quantised model inference on Graviton 3 CPUs.
Kernel supports:
Shapes – 2d,3d,4d
Zeropoint
Bias – mask - 1xN(f32)
Dynamic quantisation of Source matrix from f32->int8 based on common policy for scales and zp(This feature is available when the flag
ONEDNN_AARCH64_MATMUL_SRC_QUANT
is enabled at runtime by :export ONEDNN_AARCH64_MATMUL_SRC_QUANT=ON
Checklist
General
make test
andmake test_benchdnn_*
) pass locally for each commit?