Changing the number of experts with a Mixtral GGUF? #5114

araleza · 2024-01-24T15:25:15Z

araleza
Jan 24, 2024

I'm using ooba webui, and I notice that when I look at the Exllamav2 model loader, it has an option, 'Number of experts per token' for Mixtral that lets you set it to a different value to the usual value of 2.

But when I use the llama.cpp loader (because I'm using an 8bpp GGUF of Mixtral), that option isn't available.

I want to see how good a response I can get from Mixtral, so I don't want to switch to a lower bpp so the model fits on my GPU, because that would make the response worse in a different way.

Is there any way to get a higher number of experts while still using a GGUF?

Answered by supportend

Jan 24, 2024

./main --help
[...]
--override-kv KEY=TYPE:VALUE
                        advanced option to override model metadata by key. may be specified multiple times

I tried this with
--override-kv llama.expert_used_count=int:3
and it worked:

llm_load_print_meta: n_expert = 8
llm_load_print_meta: n_expert_used = 3

View full answer

supportend · 2024-01-24T15:47:44Z

supportend
Jan 24, 2024

./main --help
[...]
--override-kv KEY=TYPE:VALUE
                        advanced option to override model metadata by key. may be specified multiple times

I tried this with
--override-kv llama.expert_used_count=int:3
and it worked:

llm_load_print_meta: n_expert = 8
llm_load_print_meta: n_expert_used = 3

0 replies

araleza · 2024-01-24T16:05:29Z

araleza
Jan 24, 2024
Author

That's really useful info, thanks! Now, I'll have a look to see where I could splice that into ooba's codebase. Unless anyone knows where that is offhand?

0 replies

araleza · 2024-01-24T16:24:15Z

araleza
Jan 24, 2024
Author

I've found this bit in llamacpp_model.py, but I haven't worked out yet how to set llama.expert_used_count, or n_expert_used to 3. Adding both of these items to the params array doesn't seem to do the job.

0 replies

araleza · 2024-01-24T18:03:06Z

araleza
Jan 24, 2024
Author

If people are looking for this thread continuing with regards to integrating this with ooba webui, I opened up a new thread on that Discussion tab:
oobabooga/text-generation-webui#5367

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing the number of experts with a Mixtral GGUF? #5114

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Changing the number of experts with a Mixtral GGUF? #5114

araleza Jan 24, 2024

Replies: 4 comments

supportend Jan 24, 2024

araleza Jan 24, 2024 Author

araleza Jan 24, 2024 Author

araleza Jan 24, 2024 Author

araleza
Jan 24, 2024

supportend
Jan 24, 2024

araleza
Jan 24, 2024
Author

araleza
Jan 24, 2024
Author

araleza
Jan 24, 2024
Author