Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: sycl: enable zp support in reorder and skip unsupported tests #2877

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AD2605
Copy link
Contributor

@AD2605 AD2605 commented Mar 13, 2025

Description

This PR enables zero point support to the SYCL backend's reorder, and adds skips in benchdnn to skip unsupported cases for the SYCL backend

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

@AD2605 AD2605 requested review from a team as code owners March 13, 2025 17:48
@github-actions github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic component:tests Codeowner: @oneapi-src/onednn-arch labels Mar 13, 2025
@AD2605 AD2605 force-pushed the atharva/reorder_blocked_format branch from acca961 to 9cf8e34 Compare March 13, 2025 17:51
@github-actions github-actions bot removed the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Mar 13, 2025
@AD2605 AD2605 force-pushed the atharva/reorder_blocked_format branch from 9cf8e34 to 886bdc7 Compare March 13, 2025 17:52
@AD2605
Copy link
Contributor Author

AD2605 commented Mar 13, 2025

make test
disable benchdnn_all
disable build_vendor_intel
disable build_vendor_intel
disable test_device_cpu
enable arch_gpu_ampere
enable benchdnn_reorder

@AD2605 AD2605 changed the title ref: sycl: enable zp support in reorder and skip unsupported benchdnn ref: sycl: enable zp support in reorder and skip unsupported tests Mar 13, 2025
@atkassen atkassen removed the request for review from a team March 13, 2025 19:10
// returning false unecessarily
const auto src_scale_dt = attr()->scales_.get_data_type(
DNNL_ARG_ATTR_SCALES | DNNL_ARG_FROM);
if (!is_supported_type(src_scale_dt)) { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: add verbose support for this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verbose is handled above, this function call is wrapped in the VDISPTACH_REORDER macro,
see- https://github.com/uxlfoundation/oneDNN/blob/1ae1905345983d74c302ac12f8d9f9ada7022ab5/src/gpu/generic/sycl/ref_reorder.hpp#L63C1-L64C51

This change just fixes a bug in that function

Copy link
Contributor

@dzarukin dzarukin Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally better to lower it from upper level to the exact places of failing instead of figuring out which condition triggered it, especially, if it's not one.

DNNL_ARG_ATTR_SCALES | DNNL_ARG_TO)) {
const auto dst_scale_dt = attr()->scales_.get_data_type(
DNNL_ARG_ATTR_SCALES | DNNL_ARG_TO);
if (!is_supported_type(dst_scale_dt)) { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments to this section.

Comment on lines 36 to 37
// To cover cases when src has more padding than destination, in that case
// simply setting the range as dst_nelems does not cover all the source elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following the comment. Could you re-phrase, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this comment, these are reminiscent of supporting blocked format in reorder, and does not hold it's meaning any more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

float dst_zp = conf_.apply_dst_zp ? dst_zp_tensor.load(0) : 0;

for (std::size_t flattened_index = item.get_global_id(0);
flattened_index < conf_.num_elements;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is item always nicely divided with this number? No need to handle tails somehow?

Copy link
Contributor Author

@AD2605 AD2605 Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially a grid strided loop, the total global_range may be lesser than the number of elements, but it strides with a global_range index, covering all the values. Thus it automatically handles all sizes, as each workitem is only responsible for one element at a time, and is not vectorized

bool zero_point_has_groups = !prb->attr.scales.is_def();
bool scales_has_groups = !prb->attr.zero_points.is_def();

if (is_blocked_format(prb->stag) || is_blocked_format(prb->dtag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please convert is_blocked_format into lambda since that's the only piece of code relying on the format. Though I'm sharing the passion they can't support blocked layouts. There are functions that return physical offset by logical one and it should do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a separate declaration is intentional here, as we will be adding similar skips for the other primitives as well.

Though I'm sharing the passion they can't support blocked layouts. There are functions that return physical offset by logical one and it should do the trick.

I agree, but I believe the decision to not support blocked formats is driven by the fact that blocked formats are used for performance. Since the generic SYCL backend does not aim to be performant, it need not support it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will be adding similar skips for the other primitives as well

Could you, please, share more details on what primitives and what kind of checks you are anticipating? Blocked layout are supposed to be useful for weights only and I'm almost sure they are utilized through any only, and not through direct blocked tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aim of this PR is to skip cases for the generic backend which return unimplemented in a benchdnn run, as unimplemented are treated as failures , so that we can make the CI fully passing.

Could you, please, share more details on what primitives and what kind of checks you are anticipating

Since we are not planning on adding blocked format support for the SYCL backend, this check will be added to each primitive.cpp in benchdnn.

I'm almost sure they are utilized through any only, and not through direct blocked tag.

benchdnn seems to test by explicitly passing --stag and --dtag (see test_reorder_all for example), so whenever that test suite is run, there would be reported failures (due to primitive returning unimplemented) which will now be skipped after these changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this part. Reorder working with explicit blocked tags is normal as some functionality must move plain data to blocked. I'm missing which other primitives will get same kind of check and why, my question was targeting that clarification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing which other primitives will get same kind of check and why, my question was targeting that clarification.

I was expecting other primitives to test with explicit blocked formats in the test_prim scripts, especially convolutions and matmuls, but it seems like there are none, except reorder. For now, I can make this a lambda, and add it back in case I come across a test which explicitly mentions blocked formats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it into a lambda

if (is_generic_gpu(engine) || is_nvidia_gpu(engine)
|| is_amd_gpu(engine)) {
const bool is_4bit_format = is_nibble_sized_type(prb->sdt)
|| is_nibble_sized_type(prb->ddt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename into is_subbyte_type and move to utils/numerics.{c,h}pp files to unify with other numeric functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Co-authored-by: Dmitry Zarukin <dmitry.zarukin@intel.com>
@AD2605 AD2605 force-pushed the atharva/reorder_blocked_format branch from 9008b9d to fdcfa7a Compare March 17, 2025 15:27
@AD2605
Copy link
Contributor Author

AD2605 commented Mar 17, 2025

make test
disable build_vendor_intel
disable build_vendor_intel
disable test_device_cpu
enable build_vendor_generic
enable arch_gpu_ampere
enable compiler_icx-oss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:tests Codeowner: @oneapi-src/onednn-arch platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants