Issue converting DeepSeek-R1 to GGUF format. #11989

bjodom · 2025-02-21T01:37:22Z

bjodom
Feb 21, 2025

Hi All,

I'm attempting to convert the DeepSeek-R1 cloned in its entirety and have had no success. I have built the latest pull of llama.cpp with the oneAPI C++ compiler and python 3.12.3 on Linux. I noticed that the model was not included in convert_hf_to_gguf_update.py so I added the link and ran that command with a huggingface token and it came back error free. I was also able to run the below command which also came back error free.

python3 convert_hf_to_gguf.py models/tokenizers/deepseek-r1/ --outfile models/ggml-vocab-deepseek-r1.gguf --vocab-only

the build had no issue converting DeepSeek-R1-Distill_llama-70B, it is working perfectly but for DeepSeek-R1 I get the following error almost immediately:

INFO:hf-to-gguf:Loading model: DeepSeek-R1
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-000163.safetensors'
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> F16, shape = {7168, 129280}
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.float8_e4m3fn --> F16, shape = {18432, 7168}
Traceback (most recent call last):
  File "/home/devcloud/llama.cpp/convert_hf_to_gguf.py", line 5010, in <module>
    main()
  File "/home/devcloud/llama.cpp/convert_hf_to_gguf.py", line 5004, in main
    model_instance.write()
  File "/home/devcloud/llama.cpp/convert_hf_to_gguf.py", line 439, in write
    self.prepare_tensors()
  File "/home/devcloud/llama.cpp/convert_hf_to_gguf.py", line 4040, in prepare_tensors
    super().prepare_tensors()
  File "/home/devcloud/llama.cpp/convert_hf_to_gguf.py", line 298, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcloud/llama.cpp/convert_hf_to_gguf.py", line 4037, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcloud/llama.cpp/convert_hf_to_gguf.py", line 214, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'

Any help would be appreciated, thanks!

Answered by fairydreaming

Mar 4, 2025

The original model has fp8 weights and some extra tensors with dequantization scales. llama.cpp doesn't support this, you have to:

Either Download bf16 DeepSeek R1, for example this: https://huggingface.co/unsloth/DeepSeek-R1-BF16. This will convert without any errors.
Or convert the original fp8 model to bf16 by yourself. You can find instructions here: https://huggingface.co/huihui-ai/DeepSeek-R1-bf16. Note that the conversion script (it's bundled with DeepSeek V3 model) uses triton, so you need a GPU for this step.

View full answer

bobzhang208 · 2025-03-02T08:19:58Z

bobzhang208
Mar 2, 2025

same issue

5 replies

fairydreaming Mar 4, 2025
Collaborator

The original model has fp8 weights and some extra tensors with dequantization scales. llama.cpp doesn't support this, you have to:

Either Download bf16 DeepSeek R1, for example this: https://huggingface.co/unsloth/DeepSeek-R1-BF16. This will convert without any errors.
Or convert the original fp8 model to bf16 by yourself. You can find instructions here: https://huggingface.co/huihui-ai/DeepSeek-R1-bf16. Note that the conversion script (it's bundled with DeepSeek V3 model) uses triton, so you need a GPU for this step.

Answer selected by CISC

nickhuang99 Mar 30, 2025

I am not very convinced this is a quantization-related issue. Here is answer from gemini: "The weight model.layers.0.mlp.down_proj.weight_scale_inv in the DeepSeek-v3 model, and similar weights found in some other transformer architectures, is related to a technique called Scaled ReLU or SwiGLU activation functions." Based on this I have a feature request and some detail of gemini is attached there.

Does this make sense?

fairydreaming Mar 30, 2025
Collaborator

@nickhuang99 No, it doesn't make any sense. Also read this: https://github.com/deepseek-ai/DeepSeek-V3/blob/a878eada08ea6913f5a2ae80a43afeffdef082ef/README_WEIGHTS.md?plain=1#L85

nickhuang99 Mar 30, 2025

This "model.layers.0.mlp.down_proj.weight_scale_inv" is essentially just a "ffn_down". i.e. in "build_ffn", if param "down" is passed, it will do the matmul. So, if convert_hf_to_ggml.py can "map" this to "ffn_down", then it works.

I tried to modify like this:
ffn_down.patch.txt

Do you think this is right approach?

fairydreaming Mar 30, 2025
Collaborator

@nickhuang99 No, you are wrong. If you see the list of tensors for DeepSeek R1/V3, for example: https://huggingface.co/deepseek-ai/DeepSeek-R1/raw/main/model.safetensors.index.json then you will see that there is weight_scale_inv present almost for every tensor, for example:

"model.layers.0.mlp.gate_proj.weight": "model-00001-of-000163.safetensors",
"model.layers.0.mlp.gate_proj.weight_scale_inv": "model-00001-of-000163.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-000163.safetensors",
"model.layers.0.mlp.up_proj.weight_scale_inv": "model-00001-of-000163.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-000163.safetensors",
"model.layers.0.mlp.down_proj.weight_scale_inv": "model-00001-of-000163.safetensors",

so the tensor you are talking about is model.layers.0.mlp.down_proj.weight, but it's in scaled fp8 format, to convert it back to bf16 you have to multiply its values by the corresponding inverse scale values from model.layers.0.mlp.down_proj.weight_scale_inv. See the conversion script for details: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py (there's a triton weight_dequant_kernel kernel in https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/inference/kernel.py for this)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue converting DeepSeek-R1 to GGUF format. #11989

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Issue converting DeepSeek-R1 to GGUF format. #11989

bjodom Feb 21, 2025

Replies: 1 comment · 5 replies

bobzhang208 Mar 2, 2025

fairydreaming Mar 4, 2025 Collaborator

nickhuang99 Mar 30, 2025

fairydreaming Mar 30, 2025 Collaborator

nickhuang99 Mar 30, 2025

fairydreaming Mar 30, 2025 Collaborator

bjodom
Feb 21, 2025

Replies: 1 comment 5 replies

bobzhang208
Mar 2, 2025

fairydreaming Mar 4, 2025
Collaborator

fairydreaming Mar 30, 2025
Collaborator

fairydreaming Mar 30, 2025
Collaborator