Skip to content

Commit 390d34c

Browse files
fadara01vpirogov
authored andcommitted
cpu: aarch64: conv: Do not fall through to direct conv for BF16
Indirect conv is faster than direct conv when source, weight and destination are of type BF16
1 parent 1da1825 commit 390d34c

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

src/cpu/aarch64/acl_convolution_utils.cpp

+4-2
Original file line numberDiff line numberDiff line change
@@ -316,8 +316,10 @@ status_t init_conf_indirect_gemm(acl_conv_conf_t &acp, memory_desc_t &src_md,
316316
const primitive_attr_t &attr) {
317317
if (weights_md.ndims != 4) return status::unimplemented;
318318

319-
// Indirect is slower for small convolution kernels
320-
if (weights_md.dims[2] == 1 && weights_md.dims[3] == 1)
319+
// Indirect is slower for small convolution kernels, except when src, weight and dst are BF16
320+
if (weights_md.dims[2] == 1 && weights_md.dims[3] == 1
321+
&& !everyone_is(data_type::bf16, src_md.data_type,
322+
weights_md.data_type, dst_md.data_type))
321323
return status::unimplemented;
322324

323325
CHECK(acl_init_conf(acp, src_md, weights_md, dst_md, bias_md, cd, attr));

0 commit comments

Comments
 (0)