xe: sdpa: improve performance of quantized sdpa with a head size of 64 #2921

umar456 · 2025-03-19T20:21:17Z

Description

This PR updates the configurations for the SDPA kernel to optimized for a head size of 64. These updates improve the performance of small head sizes by 1.1-1.5x on LNL. Performance in other platforms will be posted soon.

| mb |  N |  D |   KV |    Q | kdt       | vdt       | mask   | quant                 | sdpa(main) | sdpa(PR) | speedup vs. main |
|----+----+----+------+------+-----------+-----------+--------+-----------------------+------------+----------+------------------|
|  1 | 32 | 64 |  385 |    1 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |      80.75 |     62.5 |            1.292 |
|  1 | 32 | 64 |  513 |    1 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |      91.96 |    70.78 |        1.2992371 |
|  1 | 32 | 64 | 1025 |    1 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |     122.32 |   101.19 |        1.2088151 |
|  1 | 32 | 64 | 2049 |    1 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |     216.77 |   154.16 |        1.4061365 |
|  1 | 32 | 64 | 4097 |    1 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |     398.24 |    257.9 |        1.5441644 |
|----+----+----+------+------+-----------+-----------+--------+-----------------------+------------+----------+------------------|
|  1 | 32 | 64 |  384 |  384 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |     402.67 |   299.89 |        1.3427257 |
|  1 | 32 | 64 |  512 |  512 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |     512.16 |   458.93 |        1.1159872 |
|  1 | 32 | 64 | 1024 | 1024 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |    1801.07 |  1616.98 |        1.1138480 |
|  1 | 32 | 64 | 2048 | 2048 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |    9421.45 |  6120.72 |        1.5392715 |
|  1 | 32 | 64 | 4096 | 4096 | s8/f16/na | s8/f16/na | causal | per_token_with_groups |    26456.8 |  23432.6 |        1.1290595 |
#+TBLFM: $12=$10/$11

Addresses: MFDNN-11755

This PR also refactors the checks for the input descriptors so that its uniform between the internal primitive and the Graph API.

src/common/sdpa_test_iface.cpp

src/common/sdpa_utils.hpp

umar456 · 2025-03-20T15:09:23Z

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable benchdnn_all
enable benchdnn_graph
enable test_device_gpu
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg

umar456 added performance platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel labels Mar 19, 2025

umar456 requested review from a team as code owners March 19, 2025 20:21

syurkevi approved these changes Mar 19, 2025

View reviewed changes

dzarukin approved these changes Mar 19, 2025

View reviewed changes

src/common/sdpa_test_iface.cpp Outdated Show resolved Hide resolved

src/common/sdpa_utils.hpp Show resolved Hide resolved

umar456 added 2 commits March 20, 2025 08:07

xe: sdpa: refactor descriptor checks

6e9f67b

xe: sdpa: Update config for quantized sdpa with head_size of 64

ce45811

umar456 force-pushed the uarshad/more_sdpa_configs branch from 4599fbd to ce45811 Compare March 20, 2025 15:07

umar456 merged commit 383a3fb into main Mar 21, 2025
15 of 17 checks passed

umar456 deleted the uarshad/more_sdpa_configs branch March 21, 2025 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xe: sdpa: improve performance of quantized sdpa with a head size of 64 #2921

xe: sdpa: improve performance of quantized sdpa with a head size of 64 #2921

umar456 commented Mar 19, 2025

umar456 commented Mar 20, 2025

xe: sdpa: improve performance of quantized sdpa with a head size of 64 #2921

xe: sdpa: improve performance of quantized sdpa with a head size of 64 #2921

Conversation

umar456 commented Mar 19, 2025

Description

umar456 commented Mar 20, 2025