Skip to content

How much memory does imatrix calculation use? #5222

Answered by ikawrakow
avada-z asked this question in Q&A
Discussion options

You must be logged in to vote

You can use any model (fp16 or any quantized version) to perform the imatrix calculation. To not have the calculation take extremely long time, the model should fit into at least your RAM. If you decide to use a quantized model for imatrix calculations, the quantized model should be better than the quantization you intend to make with the computed imatrix. As @sorasoras mentions, Q8_0 can be used for any quantization without degradation of quality (but Q8_0 does not help you for a 70B model on a system with 32 GB RAM). I have not experimented with using, e.g., Q4_K_M to compute an imatrix that I hen use to quantize, say, IQ2_XS. Perhaps you can try and report your experience?

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@Artefact2
Comment options

Answer selected by avada-z
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants