Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic: sycl: fix sum with many inputs #2108

Merged

Conversation

sgeor255
Copy link
Contributor

Description

This PR fixes an issue in the generic SYCL sum where the base_prims_ vector was accessed out of bounds because the index variable i was incremented before the access.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

@sgeor255 sgeor255 requested a review from a team as a code owner September 23, 2024 09:38
@github-actions github-actions bot added the platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic label Sep 23, 2024
@mgouicem
Copy link
Contributor

make test
disable device_cpu
enable device_gpu
enable thr_generic
enable arch_rtx

@mgouicem
Copy link
Contributor

Thanks @sgeor255 , I am still seeing a lot of failing tests cases failing in the Pre-commit testing for sum.

358: tests:1408 passed:248 skipped:816 mistrusted:36 unimplemented:0 invalid_arguments:0 failed:308 listed:0
358: total: 8.10s; fill: 0.63s (8%); compute_ref: 0.15s (2%); compare: 2.45s (30%);
1/1 Test #358: test_benchdnn_modeC_sum_ci_gpu ...***Failed    8.24 sec

Here is an example:

358: run: --mode-modifier=P --sum --engine=gpu --sdt=f32:u8:s8 --ddt=s8 --stag=axb:axb:axb --scales=0.25:2:0.5 4x16x8x10x2
358: [   0][DST][0:0:0:0:0] exp_f32:    -18.9375 exp:    -18.9375 got:         nan diff:     nan rdiff:     nan
358: [   1][DST][0:0:0:0:1] exp_f32:     -18.125 exp:     -18.125 got:         nan diff:     nan rdiff:     nan
358: [   2][DST][0:0:0:1:0] exp_f32:    -17.3125 exp:    -17.3125 got:         nan diff:     nan rdiff:     nan
358: [   3][DST][0:0:0:1:1] exp_f32:         -16 exp:         -16 got:         nan diff:     nan rdiff:     nan
358: [   4][DST][0:0:0:2:0] exp_f32:    -15.1875 exp:    -15.1875 got:         nan diff:     nan rdiff:     nan
358: [   5][DST][0:0:0:2:1] exp_f32:     -14.375 exp:     -14.375 got:         nan diff:     nan rdiff:     nan
358: [   6][DST][0:0:0:3:0] exp_f32:    -13.5625 exp:    -13.5625 got:         nan diff:     nan rdiff:     nan
358: [   7][DST][0:0:0:3:1] exp_f32:      -12.75 exp:      -12.75 got:         nan diff:     nan rdiff:     nan
358: [   8][DST][0:0:0:4:0] exp_f32:    -11.4375 exp:    -11.4375 got:         nan diff:     nan rdiff:     nan
358: [   9][DST][0:0:0:4:1] exp_f32:     -10.625 exp:     -10.625 got:         nan diff:     nan rdiff:     nan
358: [COMPARE_STATS][DST]: trh=0 err_max_diff:     nan err_max_rdiff:     nan all_max_diff:     nan all_max_rdiff:     nan
358: 1401:FAILED (errors:10240 total:10240) __REPRO: --mode-modifier=P --sum --engine=gpu --sdt=f32:u8:s8 --ddt=s8 --stag=axb:axb:axb --scales=0.25:2:0.5 4x16x8x10x2

Shall this PR fix most of these issues, or are all those different (it seems they all fail by getting NaN in their output, even though it seems fishy to get NaN for s8 output...)

@densamoilov
Copy link
Contributor

@sgeor255, thanks for the fix, there is an internal tracker for the bug, can you please update it and put a link to this PR there as well?

@sgeor255
Copy link
Contributor Author

Thanks @sgeor255 , I am still seeing a lot of failing tests cases failing in the Pre-commit testing for sum.

358: tests:1408 passed:248 skipped:816 mistrusted:36 unimplemented:0 invalid_arguments:0 failed:308 listed:0
358: total: 8.10s; fill: 0.63s (8%); compute_ref: 0.15s (2%); compare: 2.45s (30%);
1/1 Test #358: test_benchdnn_modeC_sum_ci_gpu ...***Failed    8.24 sec

Here is an example:

358: run: --mode-modifier=P --sum --engine=gpu --sdt=f32:u8:s8 --ddt=s8 --stag=axb:axb:axb --scales=0.25:2:0.5 4x16x8x10x2
358: [   0][DST][0:0:0:0:0] exp_f32:    -18.9375 exp:    -18.9375 got:         nan diff:     nan rdiff:     nan
358: [   1][DST][0:0:0:0:1] exp_f32:     -18.125 exp:     -18.125 got:         nan diff:     nan rdiff:     nan
358: [   2][DST][0:0:0:1:0] exp_f32:    -17.3125 exp:    -17.3125 got:         nan diff:     nan rdiff:     nan
358: [   3][DST][0:0:0:1:1] exp_f32:         -16 exp:         -16 got:         nan diff:     nan rdiff:     nan
358: [   4][DST][0:0:0:2:0] exp_f32:    -15.1875 exp:    -15.1875 got:         nan diff:     nan rdiff:     nan
358: [   5][DST][0:0:0:2:1] exp_f32:     -14.375 exp:     -14.375 got:         nan diff:     nan rdiff:     nan
358: [   6][DST][0:0:0:3:0] exp_f32:    -13.5625 exp:    -13.5625 got:         nan diff:     nan rdiff:     nan
358: [   7][DST][0:0:0:3:1] exp_f32:      -12.75 exp:      -12.75 got:         nan diff:     nan rdiff:     nan
358: [   8][DST][0:0:0:4:0] exp_f32:    -11.4375 exp:    -11.4375 got:         nan diff:     nan rdiff:     nan
358: [   9][DST][0:0:0:4:1] exp_f32:     -10.625 exp:     -10.625 got:         nan diff:     nan rdiff:     nan
358: [COMPARE_STATS][DST]: trh=0 err_max_diff:     nan err_max_rdiff:     nan all_max_diff:     nan all_max_rdiff:     nan
358: 1401:FAILED (errors:10240 total:10240) __REPRO: --mode-modifier=P --sum --engine=gpu --sdt=f32:u8:s8 --ddt=s8 --stag=axb:axb:axb --scales=0.25:2:0.5 4x16x8x10x2

Shall this PR fix most of these issues, or are all those different (it seems they all fail by getting NaN in their output, even though it seems fishy to get NaN for s8 output...)

This PR only fixes the issue with the segmentation fault in sum when there are more than 8 inputs which was reproducible by running the sum example in examples/primitives. I did run the tests locally (with all other implementations skipped) though and did not see the above failures (the above test got skipped in that case). I ran it again now without skipping any implementations and I can reproduce the above failure. It seems the ref:any implementation is picked for the above test. We can look into fixing this in a separate PR.

@densamoilov densamoilov merged commit b93e118 into uxlfoundation:main Sep 25, 2024
16 of 17 checks passed
@vpirogov vpirogov added this to the v3.7 milestone Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants