You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using Qwen2.5-14B-Instruct with vLLM. However, we found the following things can make the output different, even we set temperature=0,top_p=1,seed=42:
vllm serve is different with vllm offline inference, using the same chat_template
vllm serve with different number of cards
different vllm version
using H100 or H200 can make a difference
That is strange. Can someone tell me why? and how can I fix the output, when changing inference enveriments?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We are using Qwen2.5-14B-Instruct with vLLM. However, we found the following things can make the output different, even we set
temperature=0,top_p=1,seed=42
:vllm serve
is different with vllm offline inference, using the same chat_templatevllm serve
with different number of cardsThat is strange. Can someone tell me why? and how can I fix the output, when changing inference enveriments?
Beta Was this translation helpful? Give feedback.
All reactions