Replies: 1 comment 1 reply
-
Please follow the troubleshooting guide to debug this. If it still doesn't work, please open a GH issue so we can investigate further. (GH discussions aren't reviewed nearly as much) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Run cmd:
After loading the weight to mem, it will take a very very long time to run the server, and for a 8b fp8 model, it took around 30GB mem after loading the weight. Then it will freeze for a long time then failed. I have tried to remove
--quantization fp8
option, nothing changed.It is running in a devcontainer in WSL2, I successfully start the server only once. Not sure what happend.
This is the configration of devcontainer:
This is the nv driver information in WSL2

vLLM API server version 0.6.6.post1
Beta Was this translation helpful? Give feedback.
All reactions