-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QR on a single matrix: valgrind reports invalid reads and writes #2328
Comments
Thanks for reporting, we will have a look at this |
Hi @lucbv . Thanks for looking into this. It seems like the problem is related to handling rectangular matrices. If the
there are no errors under valgrind. Digging into the code a bit, but without a full understanding of it, I see that this loop over matrix rows:
that successively removes one row and one column to form the 3x3 partitioned matrix In the original case of the 16x10 matrix, running valgrind with the gdbserver I see that the first invalid read occurs in The test case in the repo for QR appears to only run with square matrices: test without column pivoting:
test 'WithColumnPivoting': kokkos-kernels/batched/dense/unit_test/Test_Batched_TeamVectorQR_WithColumnPivoting.hpp Line 121 in 2c4dd7e
|
Okay, thanks for digging a bit into this, I will run the code in valgrind / gdb as well and hopefully can reproduce and report my observation. The algorithm indeed uses a partitioning in the matrix to perform some operations but it should still work for rectangular matrices. Once I find something promising I will let you know about it : ) |
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well.
The PR above, #2342, has a fix for the rectangular matrices and introduces more tests for the Serial QR feature. The tests are not fully implemented yet but the fix seems to be okay if you want to give it a try. |
This is great. Thank you @lucbv. Using the PR branch (9121f0a) I ran the reproducer under valgrind again and the Running the expanded version of the reproducer: stack at first Trsv invalid readRunning under gdb reports the following values of variables at the point of the first reported invalid read.
Given this loop from
p twice to compute the index into A here (where the invalid read occurs):
valgrind log
|
Okay, I will try to wrap up the PR and get that tested and merged, then I can move on to trsv, hopefully it's not more complicated than the QR fix but writing proper tests is what takes time! |
So I have not looked at it in detail but my guess is that we are assuming the triangular matrix to be stored in a square matrix, size |
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well.
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Signed-off-by: Luc <lberge@sandia.gov>
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Signed-off-by: Luc <lberge@sandia.gov>
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Signed-off-by: Luc <lberge@sandia.gov>
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Fixing unused variable error It looks like the Left NoTranspose ApplyQ is not doing the correct thing. Will have a look at that next. Signed-off-by: Luc <lberge@sandia.gov>
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Fixing unused variable error It looks like the Left NoTranspose ApplyQ is not doing the correct thing. Will have a look at that next. Signed-off-by: Luc <lberge@sandia.gov> Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Fixing unused variable error It looks like the Left NoTranspose ApplyQ is not doing the correct thing. Will have a look at that next. Signed-off-by: Luc <lberge@sandia.gov> Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
@lucbv what's the status of this? We would like to make use of this. |
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Fixing unused variable error It looks like the Left NoTranspose ApplyQ is not doing the correct thing. Will have a look at that next. Spliting the tests a bit better, looking at Q and Qt together to detect issues with consistency between the two. Also eventually allows to use GEMM to figure out which one is wrong. Final fixes and adjusting the tolerance Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
@jacobmerson I am cleaning it up now, it should merge on Monday |
Great thank you! |
* batched - dense: Testing and fixing Serial QR The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue #2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Fixing unused variable error It looks like the Left NoTranspose ApplyQ is not doing the correct thing. Will have a look at that next. Spliting the tests a bit better, looking at Q and Qt together to detect issues with consistency between the two. Also eventually allows to use GEMM to figure out which one is wrong. Final fixes and adjusting the tolerance Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> * Removing the stride for the workspace Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> * Applying clang-format Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> * Adding checks on inputs Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> * More clean-ups Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> * clang-format Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> * Batched - QR: adding static assertion on struct templates Checking that the side, mode and algo are all valid. Cleaning up the test by using create_mirror_view_and_copy when appropriate. Somehow the semantic of create_mirror_view_and_copy is different from the regular deep_copy so need to make a few changes after testing on GPUs... Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> * Batched - QR: adding check for apply Q on the right side. Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> * clang-format Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> * clang-format Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov> --------- Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
The PR has been merged so you should be able to test again, please let us know if you see any issues? |
Hello,
Calling
SerialQR
on a single matrix defined asKokkos::View<double[16][10]>
and running with the Kokkos Serial backend results in valgrind invalid read and write errors (pasted below). The reproducer is pasted below.Interestingly, when using the CUDA backend in an expanded version of the reproducer (which includes a result comparison after applying the QR factorization via
ApplyQ
andTrsv
) there are no obvious issues.Note, I'm still figuring out how the QR interface works, hence the single matrix input to QR.
Am I doing anything obviously wrong here? Any help is appreciated.
reproducer
kokkos and kokkos-kernels build
I'm building kokkos (develop @ c2a342b26) and kokkos-kernels (develop @ f26fbca) with the following cmake commands using GCC 12.3.0 on a RHEL9 system.
valgrind errors
The text was updated successfully, but these errors were encountered: