Skip to content

Commit 9980e86

Browse files
sunxiaoxia2022wangleis
andauthoredFeb 25, 2025
Setting api to sync when hint is latency in benchmark_app (#29060)
### Details: Now the default setting of latency mode in benchmark_app is aync mode. So when we run benchmark_app on dual socket platform with latency hint, OV RT create stream on socket that benchmark_app is running on. However, thread of benchmark_app may be scheduled to other socket by OS during inference. Then performance will drop. After changing the default setting to sync mode, the benchmark_app thread becomes one of the inference threads. This reduces cross-socket switching, leading to more stable performance results. - *Setting api to sync when hint is latency* ### Tickets: - *CVS-154111* --------- Co-authored-by: Wanglei Shen <wanglei.shen@intel.com>
1 parent c3d0954 commit 9980e86

File tree

6 files changed

+16
-7
lines changed

6 files changed

+16
-7
lines changed
 

‎docs/articles_en/get-started/learn-openvino/openvino-samples/benchmark-tool.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@ available options and parameters:
382382
-t TIME, --time TIME Optional. Time in seconds to execute topology.
383383
384384
-api {sync,async}, --api_type {sync,async}
385-
Optional. Enable using sync/async API. Default value is async.
385+
Optional. Enable using sync/async API. When hint is throughput, default value is async. When hint is latency, default value is sync.
386386
387387
388388
Input shapes:
@@ -557,7 +557,7 @@ available options and parameters:
557557
-c <absolute_path> Required for GPU custom kernels. Absolute path to an .xml file with the kernels description.
558558
-cache_dir <path> Optional. Enables caching of loaded models to specified directory. List of devices which support caching is shown at the end of this message.
559559
-load_from_file Optional. Loads model from file directly without read_model. All CNNNetwork options (like re-shape) will be ignored
560-
-api <sync/async> Optional. Enable Sync/Async API. Default value is "async".
560+
-api <sync/async> Optional. Enable Sync/Async API. When hint is throughput, default value is "async". When hint is latency, default value is "sync".
561561
-nireq <integer> Optional. Number of infer requests. Default value is determined automatically for device.
562562
-nstreams <integer> Optional. Number of streams to use for inference on the CPU or GPU devices (for HETERO and MULTI device cases use format <dev1>:<nstreams1>, <dev2>:<nstreams2> or just <nstreams>). Default value is determined automatically for a device.Please note that although the automatic selection usually provides a reasonable performance, it still may be non - optimal for some cases, especially for very small models. See sample's README for more details. Also, using nstreams>1 is inherently throughput-oriented option, while for the best-latency estimations the number of streams should be set to 1.
563563
-inference_only Optional. Measure only inference stage. Default option for static models. Dynamic models are measured in full mode which includes inputs setup stage, inference only mode available for them with single input data shape only. To enable full mode for static models pass "false" value to this argument: ex. "-inference_only=false".

‎samples/cpp/benchmark_app/benchmark_app.hpp

+4-2
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,9 @@ static const char layout_message[] =
9898
"For example, \"input1[NCHW],input2[NC]\" or \"[NCHW]\" in case of one input size.";
9999

100100
/// @brief message for execution mode
101-
static const char api_message[] = "Optional. Enable Sync/Async API. Default value is \"async\".";
101+
static const char api_message[] =
102+
"Optional. Enable Sync/Async API. When hint is throughput, default value is \"async\". "
103+
"When hint is latency, default value is \"sync\".";
102104

103105
/// @brief message for #streams for CPU inference
104106
static const char infer_num_streams_message[] =
@@ -303,7 +305,7 @@ DEFINE_string(cache_dir, "", cache_dir_message);
303305
DEFINE_bool(load_from_file, false, load_from_file_message);
304306

305307
/// @brief Define execution mode
306-
DEFINE_string(api, "async", api_message);
308+
DEFINE_string(api, "", api_message);
307309

308310
/// @brief Number of infer requests in parallel
309311
DEFINE_uint64(nireq, 0, infer_requests_count_message);

‎samples/cpp/benchmark_app/main.cpp

+3
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,9 @@ bool parse_and_check_command_line(int argc, char* argv[]) {
5858
show_usage();
5959
throw std::logic_error("The percentile value is incorrect. The applicable values range is [1, 100].");
6060
}
61+
if (FLAGS_api == "") {
62+
FLAGS_api = FLAGS_hint == "latency" ? "sync" : "async";
63+
}
6164
if (FLAGS_api != "async" && FLAGS_api != "sync") {
6265
throw std::logic_error("Incorrect API. Please set -api option to `sync` or `async` value.");
6366
}

‎tools/benchmark_tool/openvino/tools/benchmark/benchmark.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ def percentile(values, percent):
1616

1717
class Benchmark:
1818
def __init__(self, device: str, number_infer_requests: int = 0, number_iterations: int = None,
19-
duration_seconds: int = None, api_type: str = 'async', inference_only = None,
19+
duration_seconds: int = None, api_type: str = '', inference_only = None,
2020
maximum_inference_rate: float = 0):
2121
self.device = device
2222
self.core = Core()

‎tools/benchmark_tool/openvino/tools/benchmark/main.py

+3
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,9 @@ def arg_not_empty(arg_value,empty_value):
4949
raise Exception("Cannot set precision for a compiled model. " \
5050
"Please re-compile your model with required precision.")
5151

52+
if args.api_type == "":
53+
args.api_type = "sync" if args.perf_hint == "latency" else "async"
54+
5255
if args.api_type == "sync":
5356
if args.time == 0 and (args.number_infer_requests > args.number_iterations):
5457
raise Exception("Number of infer requests should be less than or equal to number of iterations in sync mode.")

‎tools/benchmark_tool/openvino/tools/benchmark/parameters.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,9 @@ def parse_args():
108108
help="Optional. Enable model caching to specified directory")
109109
advs.add_argument('-lfile', '--load_from_file', required=False, nargs='?', default=argparse.SUPPRESS,
110110
help="Optional. Loads model from file directly without read_model.")
111-
args.add_argument('-api', '--api_type', type=str, required=False, default='async', choices=['sync', 'async'],
112-
help='Optional. Enable using sync/async API. Default value is async.')
111+
args.add_argument('-api', '--api_type', type=str, required=False, default='', choices=['sync', 'async'],
112+
help='Optional. Enable using sync/async API. When hint is throughput, default value is async. '
113+
'When hint is latency, default value is sync.')
113114
advs.add_argument('-nireq', '--number_infer_requests', type=check_positive, required=False, default=0,
114115
help='Optional. Number of infer requests. Default value is determined automatically for device.')
115116
advs.add_argument('-nstreams', '--number_streams', type=str, required=False, default=None,

0 commit comments

Comments
 (0)
Please sign in to comment.