Some questions about setting the descriptor for wgmma. #2223
Unanswered
linuxlonelyeagle
asked this question in
Q&A
Replies: 2 comments 12 replies
-
@hwu36 @thakkarV I'm sorry to bother you guys. But I thought you guys might know what to do. I've been researching my opinion on this issue for over ten days now. |
Beta Was this translation helpful? Give feedback.
4 replies
-
https://docs.nvidia.com/cuda/parallel-thread-execution/#async-warpgroup-k-no-swizzle-tf32 .Other than that, I found errors in the doc. |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm working on how to write a single gemm using wgmma's PTX.But I'm running into some problems and this one is about the descriptor.I paused here for a long time.
I'm going to describe my progress on this issue in more detail below.
At first, my program ran successfully, but he didn't get the correct calculations.I found the calculations strange, the rest of the program should be fine, and I deduced that the mma's descriptor should not be set correctly.
I started researching how CUTE was made and I found make_gemm-desc in CUTE.It was too much of a pain in the ass to use, and then I ported him to a version that would run on the CPU.Then I init my a, b tensor.
The layout here references the https://docs.nvidia.com/cuda/parallel-thread-execution/#async-warpgroup-k-no-swizzle-tf32.
Then I used my make_gemm-desc(The code is the same as upstream, just for the output Leading dimension byte offset and Stride dimension byte offset) to get the Leading dimension byte offset and Stride dimension byte offset.
For a matrix lbo: 64, sbo: 8, for b Matrix lbo:16, ebo: 8.
Then I apply the Leading dimension byte offset and stride dimension byte offset to my program.It's still not running right.
I started trying to adjust the contents of the A and B matrices.
Take the A matrix as an example.
or
Can anyone see where I went wrong?I hope someone can help me, thank you very much.This is important to me, and understanding this issue facilitates my involvement in open source compilers such as LLVM/MLIR.Thanks all.
Beta Was this translation helpful? Give feedback.
All reactions