libcu++
(libcudacxx
) provides fundamental, idiomatic C++ abstractions that aim to make the lives of CUDA C++ developers easier.
Specifically, libcu++
provides:
- C++ Standard Library features useable in both host and device code
- Extensions to C++ Standard Library features
- Fundamental, CUDA-specific programming model abstractions
If you are a C++ developer, then you know the C++ Standard Library (sometimes referred to as "The STL") as what comes along with your compiler and provides things like std::string
or std::vector
or std::atomic
.
It provides the fundamental abstractions that C++ developers need to build high quality applications and libraries.
By default, these abstractions aren't available when writing CUDA C++ device code because they don't have the necessary __host__ __device__
decorators, and their implementation may not be suitable for using in and across host and device code.
libcu++ aims to solve this problem by providing an opt-in, incremental, heterogeneous implementation of C++ Standard Library features:
- Opt-in: It does not replace the Standard Library provided by your host compiler (aka anything in
std::
) - Incremental: It does not provide a complete C++ Standard Library implementation
- Heterogeneous: It works in both host and device code, as well as passing between host and device code.
If you know how to use things like the <atomic>
or <type_traits>
headers from the C++ Standard Library, then you know how to use libcu++.
All you have to do is add cuda/std/
to the start of your includes and cuda::
before any uses of std::
:
#include <cuda/std/atomic>
cuda::std::atomic<int> x;
Note
libcu++ does not provide its own documentation for Standard Library features. Instead, libcu++ documents which Standard Library headers are made available, and defers documentation of individual features within those headers to other sources like cppreference.
libcu++ provides CUDA C++ developers with familiar Standard Library utilties to improve productivity and flatten the learning curve of learning CUDA. However, there are many aspects of writing high-performance CUDA C++ code that cannot be expressed through purely Standard conforming APIs. For these cases, libcu++ also provides extensions of Standard Library utilities.
For example, libcu++ extends atomic<T>
and other synchornization primitives with the notion of a "thread scope" that controls the strength of the memory fence.
To use utilities that are extensions to Standard Library features, drop the std
:
#include <cuda/atomic>
cuda::atomic<int, cuda::thread_scope_device> x;
See the Extended API section for more information.
Some abstractions that libcu++ provide have no equivalent in the C++ Standard Library, but are otherwise abstractions fundamental to the CUDA C++ programming model.
For example, cuda::memcpy_async
is a vital abstraction for asynchronous data movement between global and shared memory.
This abstracts hardware features such as LDGSTS
on Ampere, and the Tensor Memory Accelerator (TMA) on Hopper.
See the Extended API section for more information.
std::
/<*>
: This is your host compiler's Standard Library that works in__host__
code only, although you can use the--expt-relaxed-constexpr
flag to use anyconstexpr
functions in__device__
code. libcu++ does not replace or interfere with host compiler's Standard Library.cuda::std::
/<cuda/std/*>
: Strictly conforming implementations of facilities from the Standard Library that work in__host__ __device__
code.cuda::
/<cuda/*>
: Conforming extensions to the Standard Library that work in__host__ __device__
code.cuda::device
/<cuda/device/*>
: Conforming extensions to the Standard Library that work only in__device__
code.cuda::ptx
: C++ convenience wrappers for inline PTX (only usable in__device__
code).
// Standard C++, __host__ only.
#include <atomic>
std::atomic<int> x;
// CUDA C++, __host__ __device__.
// Strictly conforming to the C++ Standard.
#include <cuda/std/atomic>
cuda::std::atomic<int> x;
// CUDA C++, __host__ __device__.
// Conforming extensions to the C++ Standard.
#include <cuda/atomic>
cuda::atomic<int, cuda::thread_scope_block> x;
libcu++ is an open source project developed on GitHub. It is NVIDIA's variant of LLVM's libc++. libcu++ is distributed under the Apache License v2.0 with LLVM Exceptions.
libcu++ aims to be a conforming implementation of the C++ Standard, ISO/IEC IS 14882, Clause 16 through 32.
libcu++ does not maintain long-term ABI stability. Promising long-term ABI stability would prevent us from fixing mistakes and providing best in class performance. So, we make no such promises.
Every major CUDA Toolkit release, the ABI will be broken. The life cycle of an ABI version is approximately one year. Long-term support for an ABI version ends after approximately two years. Please see the versioning section for more details.
We recommend that you always recompile your code and dependencies with the latest NVIDIA SDKs and use the latest NVIDIA C++ Standard Library ABI. Live at head.