generic: sycl: fix sum with many inputs #2108

sgeor255 · 2024-09-23T09:38:26Z

Description

This PR fixes an issue in the generic SYCL sum where the base_prims_ vector was accessed out of bounds because the index variable i was incremented before the access.

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

mgouicem · 2024-09-23T11:09:13Z

make test
disable device_cpu
enable device_gpu
enable thr_generic
enable arch_rtx

mgouicem · 2024-09-23T14:08:29Z

Thanks @sgeor255 , I am still seeing a lot of failing tests cases failing in the Pre-commit testing for sum.

358: tests:1408 passed:248 skipped:816 mistrusted:36 unimplemented:0 invalid_arguments:0 failed:308 listed:0
358: total: 8.10s; fill: 0.63s (8%); compute_ref: 0.15s (2%); compare: 2.45s (30%);
1/1 Test #358: test_benchdnn_modeC_sum_ci_gpu ...***Failed    8.24 sec

Here is an example:

358: run: --mode-modifier=P --sum --engine=gpu --sdt=f32:u8:s8 --ddt=s8 --stag=axb:axb:axb --scales=0.25:2:0.5 4x16x8x10x2
358: [   0][DST][0:0:0:0:0] exp_f32:    -18.9375 exp:    -18.9375 got:         nan diff:     nan rdiff:     nan
358: [   1][DST][0:0:0:0:1] exp_f32:     -18.125 exp:     -18.125 got:         nan diff:     nan rdiff:     nan
358: [   2][DST][0:0:0:1:0] exp_f32:    -17.3125 exp:    -17.3125 got:         nan diff:     nan rdiff:     nan
358: [   3][DST][0:0:0:1:1] exp_f32:         -16 exp:         -16 got:         nan diff:     nan rdiff:     nan
358: [   4][DST][0:0:0:2:0] exp_f32:    -15.1875 exp:    -15.1875 got:         nan diff:     nan rdiff:     nan
358: [   5][DST][0:0:0:2:1] exp_f32:     -14.375 exp:     -14.375 got:         nan diff:     nan rdiff:     nan
358: [   6][DST][0:0:0:3:0] exp_f32:    -13.5625 exp:    -13.5625 got:         nan diff:     nan rdiff:     nan
358: [   7][DST][0:0:0:3:1] exp_f32:      -12.75 exp:      -12.75 got:         nan diff:     nan rdiff:     nan
358: [   8][DST][0:0:0:4:0] exp_f32:    -11.4375 exp:    -11.4375 got:         nan diff:     nan rdiff:     nan
358: [   9][DST][0:0:0:4:1] exp_f32:     -10.625 exp:     -10.625 got:         nan diff:     nan rdiff:     nan
358: [COMPARE_STATS][DST]: trh=0 err_max_diff:     nan err_max_rdiff:     nan all_max_diff:     nan all_max_rdiff:     nan
358: 1401:FAILED (errors:10240 total:10240) __REPRO: --mode-modifier=P --sum --engine=gpu --sdt=f32:u8:s8 --ddt=s8 --stag=axb:axb:axb --scales=0.25:2:0.5 4x16x8x10x2

Shall this PR fix most of these issues, or are all those different (it seems they all fail by getting NaN in their output, even though it seems fishy to get NaN for s8 output...)

densamoilov · 2024-09-23T23:18:00Z

@sgeor255, thanks for the fix, there is an internal tracker for the bug, can you please update it and put a link to this PR there as well?

sgeor255 · 2024-09-24T09:41:14Z

Thanks @sgeor255 , I am still seeing a lot of failing tests cases failing in the Pre-commit testing for sum.

358: tests:1408 passed:248 skipped:816 mistrusted:36 unimplemented:0 invalid_arguments:0 failed:308 listed:0
358: total: 8.10s; fill: 0.63s (8%); compute_ref: 0.15s (2%); compare: 2.45s (30%);
1/1 Test #358: test_benchdnn_modeC_sum_ci_gpu ...***Failed    8.24 sec

Here is an example:

358: run: --mode-modifier=P --sum --engine=gpu --sdt=f32:u8:s8 --ddt=s8 --stag=axb:axb:axb --scales=0.25:2:0.5 4x16x8x10x2
358: [   0][DST][0:0:0:0:0] exp_f32:    -18.9375 exp:    -18.9375 got:         nan diff:     nan rdiff:     nan
358: [   1][DST][0:0:0:0:1] exp_f32:     -18.125 exp:     -18.125 got:         nan diff:     nan rdiff:     nan
358: [   2][DST][0:0:0:1:0] exp_f32:    -17.3125 exp:    -17.3125 got:         nan diff:     nan rdiff:     nan
358: [   3][DST][0:0:0:1:1] exp_f32:         -16 exp:         -16 got:         nan diff:     nan rdiff:     nan
358: [   4][DST][0:0:0:2:0] exp_f32:    -15.1875 exp:    -15.1875 got:         nan diff:     nan rdiff:     nan
358: [   5][DST][0:0:0:2:1] exp_f32:     -14.375 exp:     -14.375 got:         nan diff:     nan rdiff:     nan
358: [   6][DST][0:0:0:3:0] exp_f32:    -13.5625 exp:    -13.5625 got:         nan diff:     nan rdiff:     nan
358: [   7][DST][0:0:0:3:1] exp_f32:      -12.75 exp:      -12.75 got:         nan diff:     nan rdiff:     nan
358: [   8][DST][0:0:0:4:0] exp_f32:    -11.4375 exp:    -11.4375 got:         nan diff:     nan rdiff:     nan
358: [   9][DST][0:0:0:4:1] exp_f32:     -10.625 exp:     -10.625 got:         nan diff:     nan rdiff:     nan
358: [COMPARE_STATS][DST]: trh=0 err_max_diff:     nan err_max_rdiff:     nan all_max_diff:     nan all_max_rdiff:     nan
358: 1401:FAILED (errors:10240 total:10240) __REPRO: --mode-modifier=P --sum --engine=gpu --sdt=f32:u8:s8 --ddt=s8 --stag=axb:axb:axb --scales=0.25:2:0.5 4x16x8x10x2

Shall this PR fix most of these issues, or are all those different (it seems they all fail by getting NaN in their output, even though it seems fishy to get NaN for s8 output...)

This PR only fixes the issue with the segmentation fault in sum when there are more than 8 inputs which was reproducible by running the sum example in examples/primitives. I did run the tests locally (with all other implementations skipped) though and did not see the above failures (the above test got skipped in that case). I ran it again now without skipping any implementations and I can reproduce the above failure. It seems the ref:any implementation is picked for the above test. We can look into fixing this in a separate PR.

generic: sycl: fix sum with many inputs

bb24631

sgeor255 requested a review from a team as a code owner September 23, 2024 09:38

github-actions bot added the platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic label Sep 23, 2024

t4c1 approved these changes Sep 23, 2024

View reviewed changes

ShanoToni approved these changes Sep 23, 2024

View reviewed changes

densamoilov approved these changes Sep 23, 2024

View reviewed changes

mgouicem approved these changes Sep 24, 2024

View reviewed changes

densamoilov merged commit b93e118 into uxlfoundation:main Sep 25, 2024
16 of 17 checks passed

vpirogov added this to the v3.7 milestone Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generic: sycl: fix sum with many inputs #2108

generic: sycl: fix sum with many inputs #2108

sgeor255 commented Sep 23, 2024

mgouicem commented Sep 23, 2024

mgouicem commented Sep 23, 2024

densamoilov commented Sep 23, 2024

sgeor255 commented Sep 24, 2024

generic: sycl: fix sum with many inputs #2108

generic: sycl: fix sum with many inputs #2108

Conversation

sgeor255 commented Sep 23, 2024

Description

Checklist

General

mgouicem commented Sep 23, 2024

mgouicem commented Sep 23, 2024

densamoilov commented Sep 23, 2024

sgeor255 commented Sep 24, 2024