[--perf-template={def [default], csv, CUSTOM_TEMPLATE}]
-- default template. Has problem name, problem dump, ops (if applied), minimum and average time and GFLOPS (if applied)csv
-- comma-separated values style template. Same as default, but dumps problem and descriptor values with comma delimiter.CUSTOM_TEMPLATE
-- user-defined template. Should consist of special options supported by specific driver. Refer to the list of options supported below.
benchdnn supports both out-of-the-box and custom performance reports. A custom template should be passed via the command line and consists of terminal and nonterminal symbols. Nonterminal symbols are printed as-is. Descriptions of terminal symbols are given below.
Note: Following generalized types are used below:
- 'Data md based' = {Bnorm, Eltwise, Lnorm, Lrn, Prelu, Shuffle, Softmax}
- 'Problem desc based' = {Bnorm, Conv, IP, Lrn, Matmul, Pool, Resampling, RNN}
- 'Ops based' = {Conv, IP, Matmul, RNN}
Data types options supported:
Syntax | Primitives | Description |
%cfg% | Conv, IP, Matmul, Pool, RNN | Config which describes data types and filling rules |
%dt% | Data md based, Resampling, Zeropad | Source and Destination Data type (precision) |
%ddt% | Binary, Concat, Reduction, Reorder, Sum | Destination data types (precision) |
%sdt% | Binary, Concat, Prelu, Reduction, Reorder, Sum | Source data types (precision) |
Format tags (physical memory layout) options supported:
Syntax | Primitives | Description |
%tag% | Data md based, Pool, Resampling, Zeropad | Source and Destination format tag |
%dtag% | Binary, Concat, Conv, IP, Matmul, Reduction, Reorder, Sum | Destination format tag |
%stag% | Binary, Concat, Conv, IP, Matmul, Prelu, Reduction, Reorder, Sum | Source format tag |
%wtag% | Conv, IP, Matmul | Weights format tag |
%stat_tag% | Lnorm | Layer Normalization statistics (mean and variance) format tag |
Other problem specific options supported:
Syntax | Primitives | Description |
%activation% | RNN | RNN activation function |
%alg% | Binary, Conv, Eltwise, Lrn, Pool, Reduction, Reorder, Resampling, RNN | Primitive algorithm |
%attr% | All | Primitive attributes |
%axis% | Concat, Shuffle, Softmax | Primitive axis |
%desc% | All | String style problem descriptor |
%DESC% | All | CSV-style problem descriptor values only |
%dir% | All, except Concat, RNN, Reorder, Sum | Primitive prop kind |
%direction% | RNN | RNN direction execution |
%driver% | All | Name of the current driver (e.g. conv, reorder) |
%engine% | All | Engine kind |
%flags% | Bnorm, Lnorm, Reorder | Primitive flags |
%group% | Shuffle | Shuffle group |
%impl% | All | Library implementation name for a given problem |
%idx% | All | Test index |
%mb% | Problem desc based, Eltwise, Softmax | Mini-batch value from user input. Prints 0 in case of input --mb=0 |
%name% | Problem desc based | Problem name |
%prb% | All | Canonical problem (options and descriptor in REPRO style) |
%prop% | RNN | RNN prop kind |
Performance profiling. All options are modifier extended (see below). Modifiers
change the meaning of terminal symbols. I.e., the sign '-' means minimum of
(in terms of time). Extensions should be specified after first percent,
describing the option in a specific order: first is time modifier, second is
unit modifier. I.e. %-Gflops%
, not %G-flops%
Caution: Threads must be pinned in order to get consistent frequency.
Performance profiling options supported:
Syntax | Primitives | Description |
%@time% | All | Execution time in milliseconds |
%@clocks% | All | Execution time in clocks |
%@freq% | All | Effective CPU frequency computed as clocks / time |
%@ibytes% | All | Number of input memories bytes of a problem |
%@obytes% | All | Number of output memories bytes of a problem |
%@iobytes% | All | Number of input and output memories bytes of a problem |
%@bw% | All | Bandwidth computed as iobytes / time |
%@ops% | Ops based | Number of ops required (padding is not taken into account) |
%@flops% | Ops based | FLOPS computed as ops / time |
%@cpdtime% | All | Primitive descriptor creation time in milliseconds. See Create Time Notes . |
%@cptime% | All | Primitive creation time in milliseconds. See Create Time Notes . |
%@ctime% | All | Total creation time (primitive descriptor + primitive) in milliseconds. See Create Time Notes . |
Modifiers supported:
Name | Description |
Time: | |
- | min (time) -- default |
0 | avg (time) |
+ | max (time) |
Unit: | (1e0) -- default |
K | Kilo (1e3) |
M | Mega (1e6) |
G | Giga (1e9) |
Benchdnn runs two create calls when primitive cache feature is enabled. A timer,
responsible for collecting create milliseconds, catches both cases. A case when
primitive cache was not hit can be obtained through the empty or max
(the default). A case when primitive cache was hit can be obtained through the
modifier. The average modifier for create times is not recommended since
this time doesn't represent any specific scenario.
Runs a set of inner products measuring performance with 6 seconds per problem dumping results with a standard performance template:
./benchdnn --ip --mode=p --max-ms-per-prb=6000 \
Output template: perf,%engine%,%name%,%prb%,%Gops%,%Gfreq%,%-time%,%-Gflops%,%0time%,%0Gflops%
Runs a set of inner products measuring performance and dumping results in CSV-style:
./benchdnn --ip --mode=p --perf-template=csv \
Output template: perf,%engine%,%name%,%dir%,%cfg%,%attr%,%DESC%,%Gops%,%Gfreq%,%-time%,%-Gflops%,%0time%,%0Gflops%
Runs a set of inner products measuring performance and dumping custom template - reporting descriptor, minimum time, and corresponding gigaFLOPS. Note: ',' is not a special symbol here; any other delimiter can be used:
./benchdnn --ip --mode=p --perf-template=%prb%,%-time%,%-Gflops% \
Output template: %prb%,%-time%,%-Gflops%