cpu: aarch64: add brgemm bwd data support for block size 8 and 16 #2865

rpushkarr · 2025-03-12T10:20:14Z

Description

This pull request adds support for BRGEMM backward data (BWD) on CPU for the AArch64 architecture, specifically for block sizes 8 and 16. The following changes have been made to enable this support:

New Files Added:

jit_brgemm_bwd_data.cpp – Implements the JIT kernel for BRGEMM backward data.
jit_brgemm_bwd_data.hpp – Defines the interface and data structures for the BRGEMM backward data kernel.

Template Changes:
The brgemm_convolution_fwd_t template has been updated to support inversion by adding a boolean parameter.

Checklist

General

[ ✓] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
[ ✓] Have you formatted the code using clang-format?

dzarukin · 2025-03-12T16:30:17Z

src/cpu/aarch64/jit_brgemm_conv.cpp

 template struct brgemm_convolution_fwd_t<sve_256>;
+template struct brgemm_convolution_fwd_t<sve_256, true>;


use_inversion was moved to convolution op_desc some time ago. I would highly appreciate if you replace a templated argument of the implementation with a code relying on a member of the op_desc.

@dzarukin Thank you for the suggestion. I haven’t worked much on this file yet, so I’d prefer to keep the current implementation as is for now since the PR works without modifying use_inversion. Also, updating use_inversion would require changes in other files, which we can address as a future task. Let me know if that works!

Also, updating use_inversion would require changes in other files

I don't see it used in other files. The further it delayed the harder it's going to remove it. I don't see why would it take too long to make the code better. Thanks.

@dzarukin Thank you for the feedback. I've made the changes now — you can review them at your convenience. Let me know if you have any further suggestions.

dzarukin · 2025-03-19T16:20:40Z

src/cpu/aarch64/jit_brgemm_conv_bwd.hpp

+
+    brgemm_convolution_bwd_t(const pd_t *apd) : primitive_t(apd) {};
+
+    ~brgemm_convolution_bwd_t() = default;


Suggested change

~brgemm_convolution_bwd_t() = default;

~brgemm_convolution_bwd_t() override = default;

dzarukin · 2025-03-19T16:23:02Z

src/cpu/aarch64/jit_brgemm_conv_bwd.hpp

@@ -0,0 +1,80 @@
+/*******************************************************************************
+* Copyright 2022-2024 Intel Corporation


Intel copyright is not needed here and for the new file above.

dzarukin · 2025-03-19T16:23:28Z

src/cpu/aarch64/jit_brgemm_conv_bwd.cpp

+*******************************************************************************/
+
+#include "common/c_types_map.hpp"
+#include "common/compiler_workarounds.hpp"


This one shouldn't be needed here.

rpushkarr · 2025-03-20T04:33:29Z

@dzarukin Thank you for the approval! I've updated the files/code based on your suggestion

rpushkarr · 2025-03-20T05:23:21Z

@Radu2k Could you please review this PR at your earliest convenience?

Radu2k

LGTM. Thanks for the contribution!

rpushkarr · 2025-03-21T05:10:00Z

@dzarukin Thanks for the approval! I've made the updates based on your suggestions. Could you please re-review and approve the PR?

dzarukin · 2025-03-21T17:29:23Z

src/cpu/aarch64/jit_brgemm_conv_bwd.hpp

+#include "cpu/cpu_convolution_pd.hpp"
+
+#include "cpu/aarch64/jit_brgemm_1x1_conv.hpp"
+#include "cpu/aarch64/jit_brgemm_conv.hpp"


Nit: seems these two can move to cpp-file instead.

dzarukin · 2025-03-21T17:29:35Z

src/cpu/aarch64/jit_brgemm_conv_bwd.hpp

+*******************************************************************************/
+
+#ifndef CPU_X64_JIT_BRGEMM_CONV_BWD_HPP
+#define CPU_X64_JIT_BRGEMM_CONV_BWD_HPP


Suggested change

#define CPU_X64_JIT_BRGEMM_CONV_BWD_HPP

#define CPU_AARCH64_JIT_BRGEMM_CONV_BWD_HPP

dzarukin · 2025-03-21T17:30:41Z

src/cpu/aarch64/jit_brgemm_conv_bwd.cpp

+            break; // non-1x1 implementation found
+        }
+    }
+    if (it == it.end()) { return status::unimplemented; }


Suggested change

if (it == it.end()) { return status::unimplemented; }

VDISPATCH_CONV(it != it.end(), "Implementation wasn't found");

rpushkarr · 2025-03-24T03:54:49Z

@dzarukin @Radu2k I've implemented all the suggested changes. Thank you!

rpushkarr · 2025-03-25T08:17:14Z

@dzarukin I've updated all the changes as requested. Please review and proceed with the merge if there are no further issues. Thank you!

rpushkarr requested review from a team as code owners March 12, 2025 10:20

github-actions bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Mar 12, 2025

Radu2k self-assigned this Mar 12, 2025

dzarukin reviewed Mar 12, 2025

View reviewed changes

rpushkarr requested review from a team as code owners March 18, 2025 07:24

rpushkarr force-pushed the jit_aarch64_brgemm_bwd_data branch from cb520db to 92ce79e Compare March 18, 2025 07:40

dzarukin approved these changes Mar 19, 2025

View reviewed changes

rpushkarr force-pushed the jit_aarch64_brgemm_bwd_data branch from 0b7d56a to a892566 Compare March 20, 2025 04:27

rpushkarr requested a review from dzarukin March 20, 2025 05:00

Radu2k approved these changes Mar 20, 2025

View reviewed changes

dzarukin approved these changes Mar 21, 2025

View reviewed changes

cpu: aarch64: add brgemm bwd data support for block size 8 and 16

3e6e67b

rpushkarr force-pushed the jit_aarch64_brgemm_bwd_data branch from 53de57f to 3e6e67b Compare March 24, 2025 03:39

Sqvid merged commit 068b775 into uxlfoundation:main Mar 26, 2025
20 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: aarch64: add brgemm bwd data support for block size 8 and 16 #2865

cpu: aarch64: add brgemm bwd data support for block size 8 and 16 #2865

rpushkarr commented Mar 12, 2025

dzarukin Mar 12, 2025

rpushkarr Mar 13, 2025

dzarukin Mar 13, 2025

rpushkarr Mar 18, 2025

dzarukin Mar 19, 2025

dzarukin Mar 19, 2025

dzarukin Mar 19, 2025

rpushkarr commented Mar 20, 2025

rpushkarr commented Mar 20, 2025

Radu2k left a comment

rpushkarr commented Mar 21, 2025

dzarukin Mar 21, 2025

dzarukin Mar 21, 2025

dzarukin Mar 21, 2025

rpushkarr commented Mar 24, 2025

rpushkarr commented Mar 25, 2025

		template struct brgemm_convolution_fwd_t<sve_256>;
		template struct brgemm_convolution_fwd_t<sve_256, true>;


		brgemm_convolution_bwd_t(const pd_t *apd) : primitive_t(apd) {};

		~brgemm_convolution_bwd_t() = default;

	~brgemm_convolution_bwd_t() = default;
	~brgemm_convolution_bwd_t() override = default;

		@@ -0,0 +1,80 @@
		/*******************************************************************************
		* Copyright 2022-2024 Intel Corporation

	#define CPU_X64_JIT_BRGEMM_CONV_BWD_HPP
	#define CPU_AARCH64_JIT_BRGEMM_CONV_BWD_HPP

	if (it == it.end()) { return status::unimplemented; }
	VDISPATCH_CONV(it != it.end(), "Implementation wasn't found");

cpu: aarch64: add brgemm bwd data support for block size 8 and 16 #2865

cpu: aarch64: add brgemm bwd data support for block size 8 and 16 #2865

Conversation

rpushkarr commented Mar 12, 2025

Description

Checklist

General

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rpushkarr commented Mar 20, 2025

rpushkarr commented Mar 20, 2025

Radu2k left a comment

Choose a reason for hiding this comment

rpushkarr commented Mar 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rpushkarr commented Mar 24, 2025

rpushkarr commented Mar 25, 2025