Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: add low latency hint for exec queues #794

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

smorek-intel
Copy link
Contributor

When workloads submit compute kernels, it's required
to keep low submission latency.
Impact of submission overhead is significant in a burst model,
when application submits short-running kernels continuously.

Signed-off-by: Szymon Morek szymon.morek@intel.com

When workloads submit compute kernels, it's required
to keep low submission latency.
Impact of submission overhead is significant in a burst model,
when application submits short-running kernels continuously.

Signed-off-by: Szymon Morek <szymon.morek@intel.com>
When workloads submit compute kernels, it's required
to keep low submission latency.
Impact of submission overhead is significant in a burst model,
when application submits short-running kernels continuously.

Signed-off-by: Szymon Morek <szymon.morek@intel.com>
Copy link
Contributor

@MichalMrozek MichalMrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@MichalMrozek MichalMrozek marked this pull request as ready for review January 9, 2025 09:40
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 9, 2025
Allow user to provide a low latency hint per exec queue. When set,
KMD sends a hint to GuC which results in special handling for this
exec queue. SLPC will ramp the GT frequency aggressively every time
it switches to this exec queue.

We need to enable the use of SLPC Compute strategy during init, but
it will apply only to exec queues that set this bit during exec queue
creation.

Improvement with this approach as below:

Before,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz
    Kernel launch latency : 283.16 us

After,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz

    Kernel launch latency : 63.38 us

UMD will indicate low latency hint with flag as mentioned below,

*     struct drm_xe_exec_queue_create exec_queue_create = {
*          .flags = DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT or 0
*          .extensions = 0,
*          .vm_id = vm,
*          .num_bb_per_exec = 1,
*          .num_eng_per_bb = 1,
*          .instances = to_user_pointer(&instance),
*     };
*     ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &exec_queue_create);

Link to UMD PR : intel/compute-runtime#794

Note: There is outstanding issue on guc side to be not able to switch to max
frequency as per strategy indicated by KMD, so for experminet/test result
hardcoding apporch was taken and passed to guc as policy. Effort on debugging
from guc side is going on in parallel.

V2:
  - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 is already planned for other hint(Szymon)
  - Add motivation to description (Lucas)

Cc:dri-devel@lists.freedesktop.org
Cc:vinay.belgaumkar@intel.com
Cc:Michal Mrozek <michal.mrozek@intel.com>
Cc:Szymon Morek <szymon.morek@intel.com>
Cc:José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
When workloads submit compute kernels, it's required
to keep low submission latency.
Impact of submission overhead is significant in a burst model,
when application submits short-running kernels continuously.

Signed-off-by: Szymon Morek <szymon.morek@intel.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 10, 2025
Allow user to provide a low latency hint per exec queue. When set,
KMD sends a hint to GuC which results in special handling for this
exec queue. SLPC will ramp the GT frequency aggressively every time
it switches to this exec queue.

We need to enable the use of SLPC Compute strategy during init, but
it will apply only to exec queues that set this bit during exec queue
creation.

Improvement with this approach as below:

Before,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz
    Kernel launch latency : 283.16 us

After,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz

    Kernel launch latency : 63.38 us

UMD will indicate low latency hint with flag as mentioned below,

*     struct drm_xe_exec_queue_create exec_queue_create = {
*          .flags = DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT or 0
*          .extensions = 0,
*          .vm_id = vm,
*          .num_bb_per_exec = 1,
*          .num_eng_per_bb = 1,
*          .instances = to_user_pointer(&instance),
*     };
*     ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &exec_queue_create);

Link to UMD PR : intel/compute-runtime#794

Note: There is outstanding issue on guc side to be not able to switch to max
frequency as per strategy indicated by KMD, so for experminet/test result
hardcoding apporch was taken and passed to guc as policy. Effort on debugging
from guc side is going on in parallel.

V3:
  - Conver user flag to kernel internal flag and use (Oak)
  - Support query config for use to check kernel support (Jose)
  - Dont need to take runtime pm (Vinay)
V2:
  - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 is already planned for other hint(Szymon)
  - Add motivation to description (Lucas)

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants