Dramatic increase of the C++ dlls size. Why? #12267

zsogitbe · 2025-03-08T06:54:25Z

zsogitbe
Mar 8, 2025

Does anyone know why the size of our C++ DLLs (CUDA) has suddenly increased to nearly 700 MB, compared to around 60 MB previously? I've also noticed some new files, including ggml-cuda.dll, which is over 600 MB. Could this be due to incorrect dynamic linking, instead of statically linking only the necessary code?

Thank you in advance for your help!

Answered by zsogitbe

Mar 9, 2025

I believe I've identified the issue. I now have a smaller ggml-cuda.dll (51MB for the Windows Release with one architecture and 157MB with two architectures). The issue seems to stem from the -arch=native option. NVCC doesn't support this option, but it appears the code requires it for some reason. I had previously removed it, but I’ve now added it back. Additionally, the architecture(s) need to be defined manually (e.g., CMAKE_CUDA_ARCHITECTURES="61;89") while ensuring that -arch=native remains present in the CMake script.

View full answer

abc-nix · 2025-03-08T08:46:15Z

abc-nix
Mar 8, 2025

When was the last time you built llama.cpp? The cuda build has been large for some time now. It would probably be smaller if you build it only for your device arch, and not generic. You may be able to make it even smaller if you build it with the compression option GGML_CUDA_COMPRESSION_MODE=size, added in #12029 , but you need CUDA Toolkit 12.8 for it to work.

6 replies

ggerganov Mar 8, 2025
Maintainer

Show the commands that you use to build. I'm not super familiar with the CUDA build process, but I suspect you are doing something wrong, because on the CI machine the binaries are only ~65MB:

This is using:

cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native

zsogitbe Mar 8, 2025
Author

Thank you for your fast reply.
Yes, these sizes look logical and good to me. In the mean time I have checked the sizes in the LLamaSharp distribution (using default compilation from llama.cpp, for cuda12 we have got there

These are already much larger files. We really need to do something to get consistent good compilation.

I have tried to compile myself with CMake on Windows 11 and cuda 12.8 with the following params:

With this I get for Windows Release cuda 12.8:

So, as a conclusion we get everywhere much larger file sizes. Can you please help? Thanks lot.

abc-nix Mar 8, 2025

In June 2024, FA kernels for different KV cache quant sizes was added in #7527. Maybe disabling GGML_CUDA_FA_ALL_QUANTS will make it smaller, if you are not going to use KV cache quantization.

zsogitbe Mar 9, 2025
Author

Thank you, abc-nix. I am a bit hesitant because I’m confused about how ggerganov still manages to maintain the former file sizes (~60 MB for the DLL), while we are seeing hundreds of MB more. I don’t believe he is using the GGML_CUDA_COMPRESSION_MODE=size option. Before considering this option, I’d like to understand why this discrepancy is happening. Concerning GGML_CUDA_FA_ALL_QUANTS=OFF, we will need to test if this causes a big difference, but even without it we have got more than 300 MB for the ggml module, much more than what Georgi has.

abc-nix Mar 9, 2025

I hope you or someone else can find a way to reduce the size for Windows. I downloaded the official release .zip file just to compare, and the cuda.dll is also large there. There should be a way to optimize it for Windows as there is already for Linux.

I switched to Linux many years ago, so I cannot help directly.

zsogitbe · 2025-03-09T11:20:27Z

zsogitbe
Mar 9, 2025
Author

I believe I've identified the issue. I now have a smaller ggml-cuda.dll (51MB for the Windows Release with one architecture and 157MB with two architectures). The issue seems to stem from the -arch=native option. NVCC doesn't support this option, but it appears the code requires it for some reason. I had previously removed it, but I’ve now added it back. Additionally, the architecture(s) need to be defined manually (e.g., CMAKE_CUDA_ARCHITECTURES="61;89") while ensuring that -arch=native remains present in the CMake script.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatic increase of the C++ dlls size. Why? #12267

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Dramatic increase of the C++ dlls size. Why? #12267

zsogitbe Mar 8, 2025

Replies: 2 comments · 6 replies

abc-nix Mar 8, 2025

ggerganov Mar 8, 2025 Maintainer

zsogitbe Mar 8, 2025 Author

abc-nix Mar 8, 2025

zsogitbe Mar 9, 2025 Author

abc-nix Mar 9, 2025

zsogitbe Mar 9, 2025 Author

zsogitbe
Mar 8, 2025

Replies: 2 comments 6 replies

abc-nix
Mar 8, 2025

ggerganov Mar 8, 2025
Maintainer

zsogitbe Mar 8, 2025
Author

zsogitbe Mar 9, 2025
Author

zsogitbe
Mar 9, 2025
Author