vulkan: use scalar FA rather than coopmat2 when N==1 #13554

jeffbolznv · 2025-05-15T04:26:23Z

before:
Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench.exe -m C:\models\llama-2-7b.Q4_0.gguf -n 128,1024,4096,8192 -p 0 -fa 0,1 --prio 2  --repetitions 1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  0 |           tg128 |        100.81 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  0 |          tg1024 |         94.63 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  0 |          tg4096 |         77.88 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  0 |          tg8192 |         62.73 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  1 |           tg128 |         98.30 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  1 |          tg1024 |         93.27 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  1 |          tg4096 |         77.11 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  1 |          tg8192 |         62.65 ± 0.00 |

after:
Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench.exe -m C:\models\llama-2-7b.Q4_0.gguf -n 128,1024,4096,8192 -p 0 -fa 0,1 --prio 2  --repetitions 1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  0 |           tg128 |        100.92 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  0 |          tg1024 |         94.62 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  0 |          tg4096 |         77.38 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  0 |          tg8192 |         62.65 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  1 |           tg128 |         98.89 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  1 |          tg1024 |         94.21 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  1 |          tg4096 |         80.34 ± 0.00 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |  1 |          tg8192 |         66.53 ± 0.00 |

vulkan: use scalar FA rather than coopmat2 when N==1

a68b90f

jeffbolznv requested a review from 0cc4m May 15, 2025 04:26

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels May 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: use scalar FA rather than coopmat2 when N==1 #13554

vulkan: use scalar FA rather than coopmat2 when N==1 #13554

jeffbolznv commented May 15, 2025

vulkan: use scalar FA rather than coopmat2 when N==1 #13554

Are you sure you want to change the base?

vulkan: use scalar FA rather than coopmat2 when N==1 #13554

Conversation

jeffbolznv commented May 15, 2025