-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add runtime memory monitoring functions #2
base: main
Are you sure you want to change the base?
Conversation
Add constants
} | ||
|
||
template <typename Space = Kokkos::DefaultExecutionSpace> | ||
void MemGetInfo(size_t* free, size_t* total) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fond of this API. Can we return the value instead of playing with addresses (Is it host or device pointer ? etc. )
We could also a simple getFreeMemSize
or something like this to return only one value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A boolean flag could also be either passed as an argument or returned in order to be sure whether the total and free memory available were obtained successfully. Right now there are unsuccessful code paths and there is no way to know that.
cudaMemGetInfo returns a cudaError_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more modern C++ will be to use std::optional
or expected
the day it will become available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that std::optional
or std::expected
based approach is a good one. std::expected
being even more suitable as the return type of a function.
However, I am feeling that it is better to go with the interface of cudaMemGetInfo
. There is a hipMemGetInfo
also with the same interface. There will be less cognitive load on the user.
|
||
// Single node memory info | ||
template <> | ||
void MemGetInfo<Kokkos::HostSpace>(size_t* free, size_t* total) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you happen to know if it works from inside a cpuset? Very often, in hpc, scheduler like slurm will put your process in a cpuset with limited resources.
memInfo/unit_test/CMakeLists.txt
Outdated
|
||
FetchContent_Declare( | ||
googletest | ||
URL https://github.com/google/googletest/archive/03597a01ee50ed33e9dfd640b249b4be3799d395.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also add URL_HASH to ensure that an uncorrupted archive is downloaded: https://stackoverflow.com/questions/72824695/usage-of-url-hash-in-fetchcontent-declare
FetchContent_MakeAvailable(googletest) | ||
|
||
enable_testing() | ||
add_executable(MemInfoTest TestMemInfo.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make MemInfoTest a variable. It is referred in three places. In the future, if we want all tests to have a suffix like "Kokkos_", it will be convenient to change in only one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not agree with this one, a query-replace is easy and not using variables in CMake is more readable.
It is up to you @AdRi1t .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion about this. Anything will work.
#include <Kokkos_Core.hpp> | ||
#include <cexa_MemInfo.hpp> | ||
#include <gtest/gtest.h> | ||
#include <cstddef> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be no enforcing of a standard way to group headers in Kokkos. Maybe you could group them according to Google Style Guide.
memInfo/unit_test/TestMemInfo.cpp
Outdated
|
||
template <typename MemorySpace = void> | ||
void testMemInfo() { | ||
size_t step1_total = 0ul; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::size_t is more maintainable.
memInfo/unit_test/TestMemInfo.cpp
Outdated
size_t step2_free = 0ul; | ||
size_t step1_free = 0ul; | ||
|
||
if constexpr (std::is_same<MemorySpace, void>::value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::is_same_v is preferable.
Explicitly including <type_traits> is preferable.
As noted above, this if-then-else logic could be avoided and there could be a single call to Kokkos::Experimental::MemGetInfo<Space>(&step1_free, &step1_total)
.
memInfo/unit_test/TestMemInfo.cpp
Outdated
#include <gtest/gtest.h> | ||
#include <cstddef> | ||
|
||
template <typename MemorySpace = void> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the name MemorySpace
to Space
, in order match with the implementation.
Also, the default type could be Kokkos::DefaultExecutionSpace
. Then the if-then-else block based on std::is_same<MemorySpace, void>::value)
could be avoided. There can just be a call to Kokkos::Experimental::MemGetInfo<Space>(&step1_free, &step1_total)
.
memInfo/unit_test/TestMemInfo.cpp
Outdated
} else { | ||
Kokkos::Experimental::MemGetInfo<MemorySpace>(&step1_free, &step1_total); | ||
} | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a scoped block?
memInfo/unit_test/TestMemInfo.cpp
Outdated
{ | ||
// Allocate 128 MiB of memory | ||
Kokkos::View<double**, MemorySpace> data("data test", 128, 1024*1024); | ||
if constexpr (std::is_same<MemorySpace, void>::value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as earlier comment, check whether this if-this-else block is needed.
Kokkos::finalize(); | ||
} | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add similar test for Sycl.
// Check that free memory is less after allocation | ||
EXPECT_LT(step2_free, step1_free); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is repeating logic in the tests. The tests are one of the two types for different execution spaces:
Kokkos::initialize();
testMemInfo<Space_i>(); // One or two spaces
Kokkos::finalize();
size_t free = 0;
size_t total = 0;
Kokkos::Experimental::MemGetInfo<SOME_SPACE>(&free, &total);
EXPECT_GT(free, 0);
EXPECT_GT(total, 0);
But it is ok if you don't refactor.
if (GlobalMemoryStatusEx(&statex) != 0) { | ||
*free = statex.ullAvailPhys; | ||
*total = statex.ullTotalPhys; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A return
statement could be added.
*free = info.freeram * info.mem_unit; | ||
*total = info.totalram * info.mem_unit; | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A return
statement could be added.
Signed-off-by: Adrien Taberner <adrien.taberner@outlook.com>
// Allocate 64 MiB of memory | ||
Kokkos::View<double**, typename Space::memory_space> data("data test", 1024, 8192); | ||
Kokkos::fence(); | ||
Kokkos::Experimental::MemGetInfo<Space>(&step2_free, &step2_total); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add these tests also here:
EXPECT_GT(step2_free, 0);
EXPECT_GT(step2_total, 0);
You had these tests earlier. You have removed these tests, while creating the macro TEST_SPACE
.
EXPECT_GT(total, 0); | ||
} | ||
#define TEST_SPACE(Space) \ | ||
TEST(MemInfo, Space) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are calling the function testMemInfo
. When an error happens inside it, it will be difficult to figure out which invocation it is from. Inserting a SCOPED_TRACE could help.
No description provided.