You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File "/usr/local/lib/python3.12/dist-packages/genai_perf/profile_data_parser/llm_profile_data_parser.py", line 229, in _preprocess_response
text = self._extract_openai_text_output(r)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/genai_perf/profile_data_parser/llm_profile_data_parser.py", line 341, in _extract_openai_text_output
completions = data["choices"][0]
~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
2025-03-20 09:12 [ERROR] genai_perf.main:58 - list index out of range
Triton Information
using container: nvcr.io/nvidia/tritonserver:25.02-py3-sdk
To Reproduce
Hosting nim enterprise backend with the model meta/llama-3.3-70b-instruct. deployed via the standard nim operator.
Description
genai-perf fails with an out of bounds error on an empty choices array when setting:
"stream_options": {
"include_usage": true
}
choices can be null as per the spec here: https://platform.openai.com/docs/api-reference/chat-streaming/streaming#chat-streaming/streaming-choices
Log from running genai-perf
Triton Information
using container: nvcr.io/nvidia/tritonserver:25.02-py3-sdk
To Reproduce
Hosting nim enterprise backend with the model meta/llama-3.3-70b-instruct. deployed via the standard nim operator.
our api-gateway appends
"stream_options": {
"include_usage": true
}
genai-perf command used:
genai-perf profile
-m $MODEL
--service-kind openai
--endpoint-type chat
--streaming
-H "Authorization: Bearer ${API_KEY}"
-H "Accept: text/event-stream"
-u OUR-ENDPOINT
--synthetic-input-tokens-mean $INPUT_SEQUENCE_LENGTH
--synthetic-input-tokens-stddev $INPUT_SEQUENCE_STD
--concurrency $CONCURRENCY
--output-tokens-mean $OUTPUT_SEQUENCE_LENGTH
--extra-inputs max_tokens:$OUTPUT_SEQUENCE_LENGTH
--extra-inputs min_tokens:$OUTPUT_SEQUENCE_LENGTH
--warmup-request-count 1
--measurement-interval 100000
--extra-inputs ignore_eos:true
--
-v
--max-threads=256
Custom curl to view the empty choices array:
curl -X POST 'llama-70b-instruct-load-test.nim-namespace:8000/v1/chat/completions' --header 'Content-Type: application/json' --header "accept: text/event-stream" --data '{
"messages": [
{
"role": "user",
"content": "generate something cool"
}
],
"model": "meta/llama-3.3-70b-instruct",
"stream": true,
"max_tokens": 20,
"min_tokens": 20,
"ignore_eos": true,
"stream_options": {
"include_usage": true
}
}'
Expected behavior
genai-perf should ignore or handle when choices is empty.
The text was updated successfully, but these errors were encountered: