-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: sycl: enable zp support in reorder and skip unsupported tests #2877
base: main
Are you sure you want to change the base?
Conversation
acca961
to
9cf8e34
Compare
9cf8e34
to
886bdc7
Compare
make test |
// returning false unecessarily | ||
const auto src_scale_dt = attr()->scales_.get_data_type( | ||
DNNL_ARG_ATTR_SCALES | DNNL_ARG_FROM); | ||
if (!is_supported_type(src_scale_dt)) { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: add verbose support for this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verbose is handled above, this function call is wrapped in the VDISPTACH_REORDER
macro,
see- https://github.com/uxlfoundation/oneDNN/blob/1ae1905345983d74c302ac12f8d9f9ada7022ab5/src/gpu/generic/sycl/ref_reorder.hpp#L63C1-L64C51
This change just fixes a bug in that function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally better to lower it from upper level to the exact places of failing instead of figuring out which condition triggered it, especially, if it's not one.
DNNL_ARG_ATTR_SCALES | DNNL_ARG_TO)) { | ||
const auto dst_scale_dt = attr()->scales_.get_data_type( | ||
DNNL_ARG_ATTR_SCALES | DNNL_ARG_TO); | ||
if (!is_supported_type(dst_scale_dt)) { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments to this section.
src/gpu/generic/sycl/ref_reorder.cpp
Outdated
// To cover cases when src has more padding than destination, in that case | ||
// simply setting the range as dst_nelems does not cover all the source elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm following the comment. Could you re-phrase, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this comment, these are reminiscent of supporting blocked format in reorder, and does not hold it's meaning any more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
float dst_zp = conf_.apply_dst_zp ? dst_zp_tensor.load(0) : 0; | ||
|
||
for (std::size_t flattened_index = item.get_global_id(0); | ||
flattened_index < conf_.num_elements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is item
always nicely divided with this number? No need to handle tails somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essentially a grid strided loop, the total global_range
may be lesser than the number of elements, but it strides with a global_range
index, covering all the values. Thus it automatically handles all sizes, as each workitem is only responsible for one element at a time, and is not vectorized
bool zero_point_has_groups = !prb->attr.scales.is_def(); | ||
bool scales_has_groups = !prb->attr.zero_points.is_def(); | ||
|
||
if (is_blocked_format(prb->stag) || is_blocked_format(prb->dtag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please convert is_blocked_format
into lambda since that's the only piece of code relying on the format. Though I'm sharing the passion they can't support blocked layouts. There are functions that return physical offset by logical one and it should do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a separate declaration is intentional here, as we will be adding similar skips for the other primitives as well.
Though I'm sharing the passion they can't support blocked layouts. There are functions that return physical offset by logical one and it should do the trick.
I agree, but I believe the decision to not support blocked formats is driven by the fact that blocked formats are used for performance. Since the generic SYCL backend does not aim to be performant, it need not support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will be adding similar skips for the other primitives as well
Could you, please, share more details on what primitives and what kind of checks you are anticipating? Blocked layout are supposed to be useful for weights only and I'm almost sure they are utilized through any
only, and not through direct blocked tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim of this PR is to skip cases for the generic backend which return unimplemented in a benchdnn run, as unimplemented are treated as failures , so that we can make the CI fully passing.
Could you, please, share more details on what primitives and what kind of checks you are anticipating
Since we are not planning on adding blocked format support for the SYCL backend, this check will be added to each primitive.cpp
in benchdnn.
I'm almost sure they are utilized through any only, and not through direct blocked tag.
benchdnn
seems to test by explicitly passing --stag
and --dtag
(see test_reorder_all for example), so whenever that test suite is run, there would be reported failures (due to primitive returning unimplemented) which will now be skipped after these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this part. Reorder working with explicit blocked tags is normal as some functionality must move plain data to blocked. I'm missing which other primitives will get same kind of check and why, my question was targeting that clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing which other primitives will get same kind of check and why, my question was targeting that clarification.
I was expecting other primitives to test with explicit blocked formats in the test_prim
scripts, especially convolutions and matmuls, but it seems like there are none, except reorder. For now, I can make this a lambda, and add it back in case I come across a test which explicitly mentions blocked formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it into a lambda
tests/benchdnn/reorder/reorder.cpp
Outdated
if (is_generic_gpu(engine) || is_nvidia_gpu(engine) | ||
|| is_amd_gpu(engine)) { | ||
const bool is_4bit_format = is_nibble_sized_type(prb->sdt) | ||
|| is_nibble_sized_type(prb->ddt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename into is_subbyte_type
and move to utils/numerics.{c,h}pp files to unify with other numeric functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Co-authored-by: Dmitry Zarukin <dmitry.zarukin@intel.com>
9008b9d
to
fdcfa7a
Compare
make test |
Description
This PR enables zero point support to the SYCL backend's reorder, and adds skips in benchdnn to skip unsupported cases for the SYCL backend
General
make test
andmake test_benchdnn_*
) pass locally for each commit?