How much memory does imatrix calculation use? #5222

avada-z · 2024-01-30T18:44:39Z

avada-z
Jan 30, 2024

Hello,
I want to try some of the newest quants, like IQ2_XS(S) or IQ3_XS, but I'm not sure if my setup (32 RAM + 6 VRAM) is capable of performing all of the imatrix evaluation steps on something as big as an 70B model in a reasonable amount of time.
If it's not, I saw that the code for the quantize.cpp seems to allow to requantize "bigger" quants like 4_K_M into smaller ones, like 3_K_M ( I might be wrong though).
Is it possible to calculate the importance matrix using already quantized models, or I have to use fp16 files?

Answered by ikawrakow

Jan 31, 2024

You can use any model (fp16 or any quantized version) to perform the imatrix calculation. To not have the calculation take extremely long time, the model should fit into at least your RAM. If you decide to use a quantized model for imatrix calculations, the quantized model should be better than the quantization you intend to make with the computed imatrix. As @sorasoras mentions, Q8_0 can be used for any quantization without degradation of quality (but Q8_0 does not help you for a 70B model on a system with 32 GB RAM). I have not experimented with using, e.g., Q4_K_M to compute an imatrix that I hen use to quantize, say, IQ2_XS. Perhaps you can try and report your experience?

View full answer

sorasoras · 2024-01-31T07:10:08Z

sorasoras
Jan 31, 2024

I would just use Q8 since it act just like fp16 anyway

0 replies

ikawrakow · 2024-01-31T10:46:40Z

ikawrakow
Jan 31, 2024

You can use any model (fp16 or any quantized version) to perform the imatrix calculation. To not have the calculation take extremely long time, the model should fit into at least your RAM. If you decide to use a quantized model for imatrix calculations, the quantized model should be better than the quantization you intend to make with the computed imatrix. As @sorasoras mentions, Q8_0 can be used for any quantization without degradation of quality (but Q8_0 does not help you for a 70B model on a system with 32 GB RAM). I have not experimented with using, e.g., Q4_K_M to compute an imatrix that I hen use to quantize, say, IQ2_XS. Perhaps you can try and report your experience?

1 reply

Artefact2 Jan 31, 2024
Collaborator

Some data here #5153 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How much memory does imatrix calculation use? #5222

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

How much memory does imatrix calculation use? #5222

avada-z Jan 30, 2024

Replies: 2 comments · 1 reply

sorasoras Jan 31, 2024

ikawrakow Jan 31, 2024

Artefact2 Jan 31, 2024 Collaborator

avada-z
Jan 30, 2024

Replies: 2 comments 1 reply

sorasoras
Jan 31, 2024

ikawrakow
Jan 31, 2024

Artefact2 Jan 31, 2024
Collaborator