ref: sycl: enable zp support in reorder and skip unsupported tests #2877

AD2605 · 2025-03-13T17:48:06Z

Description

This PR enables zero point support to the SYCL backend's reorder, and adds skips in benchdnn to skip unsupported cases for the SYCL backend

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

AD2605 · 2025-03-13T17:57:53Z

make test
disable benchdnn_all
disable build_vendor_intel
disable build_vendor_intel
disable test_device_cpu
enable arch_gpu_ampere
enable benchdnn_reorder

src/gpu/generic/sycl/ref_reorder.hpp

dzarukin · 2025-03-13T20:54:22Z

src/gpu/generic/sycl/ref_reorder.hpp

+                // returning false unecessarily
+                const auto src_scale_dt = attr()->scales_.get_data_type(
+                        DNNL_ARG_ATTR_SCALES | DNNL_ARG_FROM);
+                if (!is_supported_type(src_scale_dt)) { return false; }


Minor: add verbose support for this line.

The verbose is handled above, this function call is wrapped in the VDISPTACH_REORDER macro,
see- https://github.com/uxlfoundation/oneDNN/blob/1ae1905345983d74c302ac12f8d9f9ada7022ab5/src/gpu/generic/sycl/ref_reorder.hpp#L63C1-L64C51

This change just fixes a bug in that function

It's generally better to lower it from upper level to the exact places of failing instead of figuring out which condition triggered it, especially, if it's not one.

dzarukin · 2025-03-13T20:54:34Z

src/gpu/generic/sycl/ref_reorder.hpp

+                        DNNL_ARG_ATTR_SCALES | DNNL_ARG_TO)) {
+                const auto dst_scale_dt = attr()->scales_.get_data_type(
+                        DNNL_ARG_ATTR_SCALES | DNNL_ARG_TO);
+                if (!is_supported_type(dst_scale_dt)) { return false; }


Same comments to this section.

src/gpu/generic/sycl/reorder_kernels.hpp

dzarukin · 2025-03-13T21:10:22Z

src/gpu/generic/sycl/ref_reorder.cpp

+    // To cover cases when src has more padding than destination, in that case
+    // simply setting the range as dst_nelems does not cover all the source elements


I'm not sure I'm following the comment. Could you re-phrase, please?

I'll remove this comment, these are reminiscent of supporting blocked format in reorder, and does not hold it's meaning any more

dzarukin · 2025-03-13T21:11:17Z

src/gpu/generic/sycl/reorder_kernels.hpp

+        float dst_zp = conf_.apply_dst_zp ? dst_zp_tensor.load(0) : 0;
+
+        for (std::size_t flattened_index = item.get_global_id(0);
+                flattened_index < conf_.num_elements;


Is item always nicely divided with this number? No need to handle tails somehow?

This is essentially a grid strided loop, the total global_range may be lesser than the number of elements, but it strides with a global_range index, covering all the values. Thus it automatically handles all sizes, as each workitem is only responsible for one element at a time, and is not vectorized

tests/benchdnn/reorder/reorder.cpp

dzarukin · 2025-03-13T21:20:05Z

tests/benchdnn/reorder/reorder.cpp

+            bool zero_point_has_groups = !prb->attr.scales.is_def();
+            bool scales_has_groups = !prb->attr.zero_points.is_def();
+
+            if (is_blocked_format(prb->stag) || is_blocked_format(prb->dtag)


Please convert is_blocked_format into lambda since that's the only piece of code relying on the format. Though I'm sharing the passion they can't support blocked layouts. There are functions that return physical offset by logical one and it should do the trick.

a separate declaration is intentional here, as we will be adding similar skips for the other primitives as well.

Though I'm sharing the passion they can't support blocked layouts. There are functions that return physical offset by logical one and it should do the trick.

I agree, but I believe the decision to not support blocked formats is driven by the fact that blocked formats are used for performance. Since the generic SYCL backend does not aim to be performant, it need not support it.

we will be adding similar skips for the other primitives as well

Could you, please, share more details on what primitives and what kind of checks you are anticipating? Blocked layout are supposed to be useful for weights only and I'm almost sure they are utilized through any only, and not through direct blocked tag.

The aim of this PR is to skip cases for the generic backend which return unimplemented in a benchdnn run, as unimplemented are treated as failures , so that we can make the CI fully passing.

Could you, please, share more details on what primitives and what kind of checks you are anticipating

Since we are not planning on adding blocked format support for the SYCL backend, this check will be added to each primitive.cpp in benchdnn.

I'm almost sure they are utilized through any only, and not through direct blocked tag.

benchdnn seems to test by explicitly passing --stag and --dtag (see test_reorder_all for example), so whenever that test suite is run, there would be reported failures (due to primitive returning unimplemented) which will now be skipped after these changes

I get this part. Reorder working with explicit blocked tags is normal as some functionality must move plain data to blocked. I'm missing which other primitives will get same kind of check and why, my question was targeting that clarification.

I'm missing which other primitives will get same kind of check and why, my question was targeting that clarification.

I was expecting other primitives to test with explicit blocked formats in the test_prim scripts, especially convolutions and matmuls, but it seems like there are none, except reorder. For now, I can make this a lambda, and add it back in case I come across a test which explicitly mentions blocked formats

Made it into a lambda

dzarukin · 2025-03-13T21:21:39Z

tests/benchdnn/reorder/reorder.cpp

+        if (is_generic_gpu(engine) || is_nvidia_gpu(engine)
+                || is_amd_gpu(engine)) {
+            const bool is_4bit_format = is_nibble_sized_type(prb->sdt)
+                    || is_nibble_sized_type(prb->ddt);


Please rename into is_subbyte_type and move to utils/numerics.{c,h}pp files to unify with other numeric functions.

Co-authored-by: Dmitry Zarukin <dmitry.zarukin@intel.com>

AD2605 · 2025-03-17T15:41:52Z

make test
disable build_vendor_intel
disable build_vendor_intel
disable test_device_cpu
enable build_vendor_generic
enable arch_gpu_ampere
enable compiler_icx-oss

AD2605 requested review from a team as code owners March 13, 2025 17:48

github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic component:tests Codeowner: @oneapi-src/onednn-arch labels Mar 13, 2025

AD2605 force-pushed the atharva/reorder_blocked_format branch from acca961 to 9cf8e34 Compare March 13, 2025 17:51

github-actions bot removed the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Mar 13, 2025

AD2605 force-pushed the atharva/reorder_blocked_format branch from 9cf8e34 to 886bdc7 Compare March 13, 2025 17:52

AD2605 changed the title ~~ref: sycl: enable zp support in reorder and skip unsupported benchdnn~~ ref: sycl: enable zp support in reorder and skip unsupported tests Mar 13, 2025

atkassen removed the request for review from a team March 13, 2025 19:10

dzarukin reviewed Mar 13, 2025

View reviewed changes

ref: sycl: enable zp support in reorder and skip unsupported tests

fdcfa7a

Co-authored-by: Dmitry Zarukin <dmitry.zarukin@intel.com>

AD2605 force-pushed the atharva/reorder_blocked_format branch from 9008b9d to fdcfa7a Compare March 17, 2025 15:27

ShanoToni approved these changes Mar 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref: sycl: enable zp support in reorder and skip unsupported tests #2877

ref: sycl: enable zp support in reorder and skip unsupported tests #2877

AD2605 commented Mar 13, 2025

AD2605 commented Mar 13, 2025

dzarukin Mar 13, 2025

AD2605 Mar 14, 2025

dzarukin Mar 14, 2025 •

edited

Loading

dzarukin Mar 13, 2025

dzarukin Mar 13, 2025

AD2605 Mar 14, 2025

AD2605 Mar 17, 2025

dzarukin Mar 13, 2025

AD2605 Mar 14, 2025 •

edited

Loading

dzarukin Mar 13, 2025

AD2605 Mar 14, 2025

dzarukin Mar 14, 2025

AD2605 Mar 14, 2025

dzarukin Mar 14, 2025

AD2605 Mar 17, 2025

AD2605 Mar 17, 2025

dzarukin Mar 13, 2025

AD2605 Mar 17, 2025

AD2605 commented Mar 17, 2025

		// To cover cases when src has more padding than destination, in that case
		// simply setting the range as dst_nelems does not cover all the source elements

ref: sycl: enable zp support in reorder and skip unsupported tests #2877

Are you sure you want to change the base?

ref: sycl: enable zp support in reorder and skip unsupported tests #2877

Conversation

AD2605 commented Mar 13, 2025

Description

General

AD2605 commented Mar 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dzarukin Mar 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AD2605 Mar 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AD2605 commented Mar 17, 2025

dzarukin Mar 14, 2025 •

edited

Loading

AD2605 Mar 14, 2025 •

edited

Loading