Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Use L0 queries for L0 devices #2245

Merged
merged 5 commits into from
Jan 14, 2025

Conversation

mgouicem
Copy link
Contributor

@mgouicem mgouicem commented Dec 9, 2024

This PR replaces OCL queries with matching L0 queries for L0 devices on Intel devices.
Few notable things:

  • Updated L0 headers to access some recent queries
  • Added headers for Intel specific L0 extensions. These headers come from compute-runtime. We asked about forward compatibility status of these symbols and are waiting for clarification (hence why this PR is in draft). Driver is backward compatible for a given major version of the level_zero driver, so we should be on the same level of compatibility as non-Intel specific functions.
  • switched ngen to dynamically load L0

@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Dec 9, 2024
@mgouicem mgouicem requested a review from a team December 9, 2024 10:48
@mgouicem
Copy link
Contributor Author

mgouicem commented Dec 9, 2024

make test
disable device_cpu
enable device_gpu
enable thr_sycl
enable thr_ocl

@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch 2 times, most recently from e0dc26f to 2c29101 Compare December 9, 2024 12:35
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 2c29101 to 6ef7b93 Compare December 10, 2024 09:39
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 6ef7b93 to 916aff8 Compare December 11, 2024 12:41
@github-actions github-actions bot added the devops Github automation label Dec 11, 2024
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch 2 times, most recently from aa79d8d to 9c03e06 Compare December 11, 2024 13:26
= {ZE_STRUCTURE_INTEL_DEVICE_MODULE_DP_EXP_PROPERTIES};
deviceModProps.pNext = &deviceModPropsExt;

CHECK(func_zeDeviceGetModuleProperties(device, &deviceModProps));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a driver support matrix? Users may have pretty old drivers (esp. on Linux), so I'm wondering if we need a fallback path here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility, we validate only last driver available at time of release (see README). In this particular case, this query appeared on Oct 31st 2023 (99abb40a)

For forward compatibility, I am working on getting clarification (and this PR will remain draft in the meantime).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I clarified the backward compatibility policy with Level Zero driver team. The extensions are backward compatible for a given major version of level zero API. So this is similar to regular (non-extensions) level zero APIs we currently use, and we should be fine with future driver updates.

However, this extension was introduced with 23.48.27912.11 release, so this would become a requirement to our users. Do you think this could be an issue for some users?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this extension was introduced with 23.48.27912.11 release, so this would become a requirement to our users. Do you think this could be an issue for some users?

I don't have any hard data on the range of L0 driver versions users have in the wild, but it seems risky to drop support for drivers more than 1 year old, especially as we have been gracefully supporting much older drivers prior to this PR. At least, let's not fail engine/primitive initialization if this call fails. We could conservatively report "no systolic support" for these older drivers. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @vpirogov. Do we need/have a policy wrt to number of drivers oneDNN supports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oneAPI policy is supporting the latest LTS and the latest rolling driver releases. I don't have concerns with dropping support for drivers >1 year old. Though I agree with @petercad's comment that we should fail gracefully if the driver is not supported. We could either assume that query returned negative or throw an error indicating that driver is not supported.

Copy link
Contributor Author

@mgouicem mgouicem Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I agree with @petercad's comment that we should fail gracefully if the driver is not supported.

I guess it depends how we define gracefully. In the current state, the engine fails to create and an error message is printed upon the failure to query a hw property. This is aligned with propagating query status to engine creation status.

The alternative suggested by Peter is to not fail engine creation upon failed query, but just proceed with disabled systolic (or atomics) support. The error message would still be printed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifications. I don't have preference between these two options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is specific to SYCL configuration the change will not affect OpenVINO. Pytorch is relying on LTS driver, which does have queries support. With that we are ok to go with the changes as is.

@mgouicem mgouicem marked this pull request as ready for review December 19, 2024 13:32
@mgouicem mgouicem requested review from a team as code owners December 19, 2024 13:32
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 9c03e06 to 6a4fee2 Compare December 19, 2024 14:07
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 6a4fee2 to 2ea8ccc Compare December 19, 2024 21:07
@github-actions github-actions bot removed the devops Github automation label Dec 19, 2024
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 2ea8ccc to 814e8bc Compare December 19, 2024 21:15
@mgouicem mgouicem requested review from petercad and removed request for a team December 19, 2024 21:15
@mgouicem
Copy link
Contributor Author

make test
disable device_cpu
enable device_gpu
enable thr_sycl
enable thr_ocl

@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 814e8bc to 6bf40c3 Compare January 6, 2025 14:27
@mgouicem mgouicem force-pushed the mgouicem/main/l0_queries branch from 6bf40c3 to 7ea8854 Compare January 9, 2025 13:21
@mgouicem mgouicem merged commit dc830ad into uxlfoundation:main Jan 14, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants