Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runtime memory monitoring functions #2

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Add runtime memory monitoring functions #2

wants to merge 7 commits into from

Conversation

AdRi1t
Copy link

@AdRi1t AdRi1t commented Feb 26, 2025

No description provided.

}

template <typename Space = Kokkos::DefaultExecutionSpace>
void MemGetInfo(size_t* free, size_t* total) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fond of this API. Can we return the value instead of playing with addresses (Is it host or device pointer ? etc. )

We could also a simple getFreeMemSize or something like this to return only one value.

Copy link

@science-enthusiast science-enthusiast Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A boolean flag could also be either passed as an argument or returned in order to be sure whether the total and free memory available were obtained successfully. Right now there are unsuccessful code paths and there is no way to know that.

cudaMemGetInfo returns a ​cudaError_t.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more modern C++ will be to use std::optional or expected the day it will become available.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that std::optional or std::expected based approach is a good one. std::expected being even more suitable as the return type of a function.

However, I am feeling that it is better to go with the interface of cudaMemGetInfo. There is a hipMemGetInfo also with the same interface. There will be less cognitive load on the user.


// Single node memory info
template <>
void MemGetInfo<Kokkos::HostSpace>(size_t* free, size_t* total) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you happen to know if it works from inside a cpuset? Very often, in hpc, scheduler like slurm will put your process in a cpuset with limited resources.


FetchContent_Declare(
googletest
URL https://github.com/google/googletest/archive/03597a01ee50ed33e9dfd640b249b4be3799d395.zip

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also add URL_HASH to ensure that an uncorrupted archive is downloaded: https://stackoverflow.com/questions/72824695/usage-of-url-hash-in-fetchcontent-declare

FetchContent_MakeAvailable(googletest)

enable_testing()
add_executable(MemInfoTest TestMemInfo.cpp)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make MemInfoTest a variable. It is referred in three places. In the future, if we want all tests to have a suffix like "Kokkos_", it will be convenient to change in only one place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not agree with this one, a query-replace is easy and not using variables in CMake is more readable.
It is up to you @AdRi1t .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion about this. Anything will work.

#include <Kokkos_Core.hpp>
#include <cexa_MemInfo.hpp>
#include <gtest/gtest.h>
#include <cstddef>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be no enforcing of a standard way to group headers in Kokkos. Maybe you could group them according to Google Style Guide.


template <typename MemorySpace = void>
void testMemInfo() {
size_t step1_total = 0ul;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::size_t is more maintainable.

size_t step2_free = 0ul;
size_t step1_free = 0ul;

if constexpr (std::is_same<MemorySpace, void>::value) {
Copy link

@science-enthusiast science-enthusiast Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using std::is_same_v is preferable.

Explicitly including <type_traits> is preferable.

As noted above, this if-then-else logic could be avoided and there could be a single call to Kokkos::Experimental::MemGetInfo<Space>(&step1_free, &step1_total).

#include <gtest/gtest.h>
#include <cstddef>

template <typename MemorySpace = void>
Copy link

@science-enthusiast science-enthusiast Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the name MemorySpace to Space, in order match with the implementation.

Also, the default type could be Kokkos::DefaultExecutionSpace. Then the if-then-else block based on std::is_same<MemorySpace, void>::value) could be avoided. There can just be a call to Kokkos::Experimental::MemGetInfo<Space>(&step1_free, &step1_total).

} else {
Kokkos::Experimental::MemGetInfo<MemorySpace>(&step1_free, &step1_total);
}
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a scoped block?

{
// Allocate 128 MiB of memory
Kokkos::View<double**, MemorySpace> data("data test", 128, 1024*1024);
if constexpr (std::is_same<MemorySpace, void>::value) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as earlier comment, check whether this if-this-else block is needed.

Kokkos::finalize();
}
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add similar test for Sycl.

// Check that free memory is less after allocation
EXPECT_LT(step2_free, step1_free);
}

Copy link

@science-enthusiast science-enthusiast Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is repeating logic in the tests. The tests are one of the two types for different execution spaces:

Kokkos::initialize();
testMemInfo<Space_i>(); // One or two spaces
Kokkos::finalize();

size_t free = 0;
size_t total = 0;
Kokkos::Experimental::MemGetInfo<SOME_SPACE>(&free, &total);
EXPECT_GT(free, 0);
EXPECT_GT(total, 0);

But it is ok if you don't refactor.

if (GlobalMemoryStatusEx(&statex) != 0) {
*free = statex.ullAvailPhys;
*total = statex.ullTotalPhys;
}
Copy link

@science-enthusiast science-enthusiast Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A return statement could be added.

*free = info.freeram * info.mem_unit;
*total = info.totalram * info.mem_unit;
return;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A return statement could be added.

Signed-off-by: Adrien Taberner <adrien.taberner@outlook.com>
// Allocate 64 MiB of memory
Kokkos::View<double**, typename Space::memory_space> data("data test", 1024, 8192);
Kokkos::fence();
Kokkos::Experimental::MemGetInfo<Space>(&step2_free, &step2_total);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add these tests also here:

EXPECT_GT(step2_free, 0);
EXPECT_GT(step2_total, 0);

You had these tests earlier. You have removed these tests, while creating the macro TEST_SPACE.

EXPECT_GT(total, 0);
}
#define TEST_SPACE(Space) \
TEST(MemInfo, Space) { \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are calling the function testMemInfo. When an error happens inside it, it will be difficult to figure out which invocation it is from. Inserting a SCOPED_TRACE could help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants