Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add threadid to AsyncStackRootHolder #644

Merged
merged 14 commits into from
Dec 4, 2024
3 changes: 2 additions & 1 deletion include/unifex/tracing/async_stack-inl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -136,9 +136,10 @@ inline AsyncStackFrame* AsyncStackRoot::getTopFrame() const noexcept {
}

inline void AsyncStackRoot::setStackFrameContext(
frame_ptr framePtr, instruction_ptr ip) noexcept {
frame_ptr framePtr, instruction_ptr ip, uint64_t tId) noexcept {
jesswong marked this conversation as resolved.
Show resolved Hide resolved
stackFramePtr = framePtr;
returnAddress = ip;
threadId = tId;
}

inline frame_ptr AsyncStackRoot::getStackFramePointer() const noexcept {
Expand Down
22 changes: 20 additions & 2 deletions include/unifex/tracing/async_stack.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,18 @@
#include <thread>

#include <unifex/detail/prologue.hpp>
#if defined(__linux__)
#include <unistd.h>
#include <sys/syscall.h>
#endif

#if defined(__APPLE__)
#define UNIFEX_SYS_gettid SYS_thread_selfid
#elif defined(SYS_gettid)
#define UNIFEX_SYS_gettid SYS_gettid
#else
#define UNIFEX_SYS_gettid __NR_gettid
#endif
jesswong marked this conversation as resolved.
Show resolved Hide resolved

namespace unifex {

Expand Down Expand Up @@ -150,7 +162,11 @@ struct AsyncStackRoot;
struct AsyncStackFrame;
namespace detail {
class ScopedAsyncStackRoot;
}
} // namespace detail

namespace utils {
uint64_t getOSThreadID();
jesswong marked this conversation as resolved.
Show resolved Hide resolved
} // namespace utils

// Get access to the current thread's top-most AsyncStackRoot.
//
Expand Down Expand Up @@ -454,7 +470,8 @@ struct AsyncStackRoot {
// normal stack-trace.
void setStackFrameContext(
frame_ptr fp = frame_ptr::read_frame_pointer(),
instruction_ptr ip = instruction_ptr::read_return_address()) noexcept;
instruction_ptr ip = instruction_ptr::read_return_address(),
uint64_t tId = utils::getOSThreadID()) noexcept;
frame_ptr getStackFramePointer() const noexcept;
instruction_ptr getReturnAddress() const noexcept;

Expand Down Expand Up @@ -501,6 +518,7 @@ struct AsyncStackRoot {
// Typically initialise with instruction_ptr::read_return_address() or
// setStackFrameContext().
instruction_ptr returnAddress;
uint64_t threadId = 0;
};

namespace detail {
Expand Down
18 changes: 18 additions & 0 deletions source/async_stack.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@
#include <cassert>
#include <mutex>

#if defined(_WIN32)
#include <windows.h>
#endif

#if !defined(UNIFEX_ASYNC_STACK_ROOT_USE_PTHREAD)
#if defined(__linux__)
# define UNIFEX_ASYNC_STACK_ROOT_USE_PTHREAD 1
Expand Down Expand Up @@ -145,6 +149,20 @@ static thread_local AsyncStackRootHolder currentThreadAsyncStackRoot;

} // namespace

namespace utils {
uint64_t getOSThreadID() {
#if defined(__APPLE__)
uint64_t tid;
pthread_threadid_np(nullptr, &tid);
return tid;
jesswong marked this conversation as resolved.
Show resolved Hide resolved
#elif defined(_WIN32)
return uint64_t(GetCurrentThreadId());
#else
return uint64_t(syscall(UNIFEX_SYS_gettid));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid defining UNIFEX_SYS_gettid by just switching here on whether SYS_gettid is defined?

Also, I'm a bit anxious about the cost of a syscall in each AsyncStackRoot constructor since we construct so many of them. Is this a cheap syscall implemented in user space? If so then it doesn't matter, but if it's a typical syscall that transits into kernel space then we should probably revisit capturing this value in the AsyncStackRootHolder since we'll have one of those per thread and the per-thread value of this is constant.

Copy link
Contributor Author

@jesswong jesswong Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I updated this to use the glibc wrapper of gettid because it implements caching (D49174906)

I also spoke to David and his response is that it's difficult to say because it depends on the implementation and we need to profile the code in order to know. This will be used a lot on android and his response is below:

I see. The overhead of the syscall is mostly dependent on what speculative execution mitigations are turned on e.g. is the kernel hardened. I don't know how we configure our Android kernels.
The more mitigations the costlier the overhead of a user to kernel transition
The only real way to know is to profile the code. But as a hunch thread creation should be off the hot path so I don't imagine adding a gettid in there will be problematic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with @ispeters offline and decided to move threadid to async stack root holder for getting the threadid per thread only

#endif
}
}

AsyncStackRoot* tryGetCurrentAsyncStackRoot() noexcept {
return currentThreadAsyncStackRoot.get();
}
Expand Down
Loading