Bitsandbytes models stopped working in the new 0.7.1 version #12849
Closed
davefojtik
announced in
Q&A
Replies: 1 comment
-
Solved. The problem was installing an outdated Flashinfer in my container image. Version |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In 0.7.0 everything works fine. But when I update to 0.7.1 with the same models, code etc. it's broken.
The logs spam a lot of
MLA is not supported with bitsandbytes quantization. Disabling MLA.
, even though theVLLM_MLA_DISABLE
env variable is set to true.Then after the
Capturing cudagraphs
part, the vllm returns the following error:Every update I go through the engine arguments and enviroment variables in the documentation to see if there's something new or changed, but this time I didn't see anything that could cause this. Did I miss some major change?
Here's the full log (we're using custom VLLM on RunPod serverless):
Beta Was this translation helpful? Give feedback.
All reactions