Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see some test cases failed because of this issue?
We have the same logic in the cuDNN based implementation, do you see any failed cases when it's used?
https://github.com/oneapi-src/oneDNN/blob/de69d44024ab4f64b20deb7aa066a65c867f1123/src/gpu/nvidia/cudnn_pooling_impl.hpp#L99-L112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notably the test case
--pool --engine=gpu ic64iw32ow16kw3sw2pw0
fails in benchdnn for AMD and not for NVIDIAAMD output:
NVIDIA output:
Build command
Tested on a Mi210 for AMD and A100 for NVIDIA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that MIOpen takes strides for descriptors creation, whereas cuDNN takes formats.
I would be curious what format gets passed there though.