-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Use L0 queries for L0 devices #2245
[SYCL] Use L0 queries for L0 devices #2245
Conversation
make test |
e0dc26f
to
2c29101
Compare
2c29101
to
6ef7b93
Compare
6ef7b93
to
916aff8
Compare
aa79d8d
to
9c03e06
Compare
= {ZE_STRUCTURE_INTEL_DEVICE_MODULE_DP_EXP_PROPERTIES}; | ||
deviceModProps.pNext = &deviceModPropsExt; | ||
|
||
CHECK(func_zeDeviceGetModuleProperties(device, &deviceModProps)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a driver support matrix? Users may have pretty old drivers (esp. on Linux), so I'm wondering if we need a fallback path here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I clarified the backward compatibility policy with Level Zero driver team. The extensions are backward compatible for a given major version of level zero API. So this is similar to regular (non-extensions) level zero APIs we currently use, and we should be fine with future driver updates.
However, this extension was introduced with 23.48.27912.11 release, so this would become a requirement to our users. Do you think this could be an issue for some users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this extension was introduced with 23.48.27912.11 release, so this would become a requirement to our users. Do you think this could be an issue for some users?
I don't have any hard data on the range of L0 driver versions users have in the wild, but it seems risky to drop support for drivers more than 1 year old, especially as we have been gracefully supporting much older drivers prior to this PR. At least, let's not fail engine/primitive initialization if this call fails. We could conservatively report "no systolic support" for these older drivers. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- @vpirogov. Do we need/have a policy wrt to number of drivers oneDNN supports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oneAPI policy is supporting the latest LTS and the latest rolling driver releases. I don't have concerns with dropping support for drivers >1 year old. Though I agree with @petercad's comment that we should fail gracefully if the driver is not supported. We could either assume that query returned negative or throw an error indicating that driver is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I agree with @petercad's comment that we should fail gracefully if the driver is not supported.
I guess it depends how we define gracefully. In the current state, the engine fails to create and an error message is printed upon the failure to query a hw property. This is aligned with propagating query status to engine creation status.
The alternative suggested by Peter is to not fail engine creation upon failed query, but just proceed with disabled systolic (or atomics) support. The error message would still be printed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifications. I don't have preference between these two options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is specific to SYCL configuration the change will not affect OpenVINO. Pytorch is relying on LTS driver, which does have queries support. With that we are ok to go with the changes as is.
9c03e06
to
6a4fee2
Compare
6a4fee2
to
2ea8ccc
Compare
2ea8ccc
to
814e8bc
Compare
make test |
814e8bc
to
6bf40c3
Compare
6bf40c3
to
7ea8854
Compare
This PR replaces OCL queries with matching L0 queries for L0 devices on Intel devices.
Few notable things:
We asked about forward compatibility status of these symbols and are waiting for clarification (hence why this PR is in draft). Driver is backward compatible for a given major version of the level_zero driver, so we should be on the same level of compatibility as non-Intel specific functions.