Loading GGML converted qlora models #4018

ragesh2000 · 2023-11-10T12:22:40Z

ragesh2000
Nov 10, 2023

I have lora weights of a finetuned model (adapter_model.bin) and i created a ggml version of the file using the python file convert-lora-to-ggml.py and now i have the ggml_model.bin file.
So now how can i merge this to base model? or there is any other method to use the converted ggml model ?

Answered by KerfuffleV2

Nov 12, 2023

You need to use the base (full) model with -m and the converted LoRA with --lora. So it should look like -m my_full_model.gguf --lora ggml-adapter-model.bin

Note: -l is not the short form of --lora, it is for setting logit bias.

View full answer

KerfuffleV2 · 2023-11-10T15:06:04Z

KerfuffleV2
Nov 10, 2023
Collaborator

examples/export-lora will let you merge a LoRA and create a full GGUF file.

11 replies

ragesh2000 Nov 12, 2023
Author

could you please elaborate what all are the arguments they are mentioning. Because i set it as
-m ggml model
-l adapter_model.bin

and iam getting the error

error: unexpected lora header file magic in 'adapter_model.bin'```

KerfuffleV2 Nov 12, 2023
Collaborator

@ragesh2000

It sounds like you didn't convert the LoRA to llama.cpp's format with convert-lora-to-ggml.py.

ragesh2000 Nov 12, 2023
Author

yes i did. I have the converted file ggml-adapter-model.bin. I have given the arguments as
-m ggml-adapter-model.bin
-l adapter_model.bin (original lora weight)
Which resulted in the above error

KerfuffleV2 Nov 12, 2023
Collaborator

You need to use the base (full) model with -m and the converted LoRA with --lora. So it should look like -m my_full_model.gguf --lora ggml-adapter-model.bin

Note: -l is not the short form of --lora, it is for setting logit bias.

Answer selected by ragesh2000

ragesh2000 Nov 12, 2023
Author

Actually the base which used to finetune is mistralai/Mistral-7B-Instruct-v0.1. I think that is not available as gguf format.

KerfuffleV2 Nov 12, 2023
Collaborator

If that was the case, you'd either be screwed or have to convert it yourself. You have to be able to apply the LoRA to something, it's not a full model. Luckily it is readily available: https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF

You can also find basically any model, many in GGUF format here: https://huggingface.co/TheBloke/

ragesh2000 Nov 12, 2023
Author

Ok thank you @KerfuffleV2 . Also one more doubt, i can find more than one gguf files over there. Which one should i use ?

KerfuffleV2 Nov 12, 2023
Collaborator

You meant which quantization? Quantization is a balance between quality and size/memory usage. There's a description of all the quantizations and their use case in the model card for the model I linked to. Q4_K_M is usually a good choice. For small models if you have plenty of free memory then you could go up to Q5_K_M. I usually use Q4_K_S for big models, like 70B.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading GGML converted qlora models #4018

{{title}}

Replies: 1 comment 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Loading GGML converted qlora models #4018

ragesh2000 Nov 10, 2023

Replies: 1 comment · 11 replies

KerfuffleV2 Nov 10, 2023 Collaborator

ragesh2000 Nov 12, 2023 Author

KerfuffleV2 Nov 12, 2023 Collaborator

ragesh2000 Nov 12, 2023 Author

KerfuffleV2 Nov 12, 2023 Collaborator

ragesh2000 Nov 12, 2023 Author

KerfuffleV2 Nov 12, 2023 Collaborator

ragesh2000 Nov 12, 2023 Author

KerfuffleV2 Nov 12, 2023 Collaborator

ragesh2000
Nov 10, 2023

Replies: 1 comment 11 replies

KerfuffleV2
Nov 10, 2023
Collaborator

ragesh2000 Nov 12, 2023
Author

KerfuffleV2 Nov 12, 2023
Collaborator

ragesh2000 Nov 12, 2023
Author

KerfuffleV2 Nov 12, 2023
Collaborator

ragesh2000 Nov 12, 2023
Author

KerfuffleV2 Nov 12, 2023
Collaborator

ragesh2000 Nov 12, 2023
Author

KerfuffleV2 Nov 12, 2023
Collaborator