-
Hello, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
I would just use Q8 since it act just like fp16 anyway |
Beta Was this translation helpful? Give feedback.
-
You can use any model ( |
Beta Was this translation helpful? Give feedback.
You can use any model (
fp16
or any quantized version) to perform the imatrix calculation. To not have the calculation take extremely long time, the model should fit into at least your RAM. If you decide to use a quantized model for imatrix calculations, the quantized model should be better than the quantization you intend to make with the computed imatrix. As @sorasoras mentions,Q8_0
can be used for any quantization without degradation of quality (butQ8_0
does not help you for a 70B model on a system with 32 GB RAM). I have not experimented with using, e.g.,Q4_K_M
to compute an imatrix that I hen use to quantize, say,IQ2_XS
. Perhaps you can try and report your experience?