Skip to content

Commit 9abcb6e

Browse files
jsmall-zzzTim Foley
and
Tim Foley
authored
Support for float atomics on RWByteAddressBuffer (shader-slang#1502)
* Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but *is* necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * WIP adding support for InterlockedAddFp32 * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * WIP - atomicAdd. This solution can't work as we can't do (float*) in glsl. * WIP on atomic float ops. * Added checking for multiple decls that takes into account __target_intrinsic and __specialized_for_target. First pass impl of atomic add on float for glsl. * Split __atomicAdd so extensions are applied appropriately. * Made Dxc/Fxc support includes. Use HLSL prelude to pass the path to nvapi Added -nv-api-path * Refactor around IncludeHandler and impl of IncludeSystem * slang-include-handler -> slang-include-system Have IncludeHandler/Impl defined in slang-preprocessor * Small comment improvements. * Document atomic float add addition in target-compatibility.md. * CUDA float atomic support on RWByteAddressBuffer. * Add atomic-float-byte-address-buffer-cross.slang * Removed inappropriate-once.slang - the test is no longer valid when a file is loaded and has a unique identity by default. A test could be made, but would require an API call to create the file (so no unique id). Improved handling of loadFile - uses uniqueId if has one. * Work around for testing target overlaps - to avoid exceptions on adding targets. Simplify PathInfo setup. Modify single-target-intrinsic.slang - it no longer failed because there were no longer multiple definitions for the same target. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
1 parent 697e7fb commit 9abcb6e

31 files changed

+776
-265
lines changed

docs/target-compatibility.md

+18
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ Items with ^ means there is some discussion about support later in the document
4040
| Atomics | Yes | Yes | Yes | Yes | No +
4141
| Atomics on RWBuffer | Yes | Yes | Yes | No | No +
4242
| Sampler Feedback | No | Yes | No + | No | Yes ^
43+
| RWByteAddressBuffer Atomic | No | Yes ^ | Yes ^ | Yes | No +
4344

4445
## Half Type
4546

@@ -179,3 +180,20 @@ There doesn't not appear to be a similar feature available in Vulkan yet, but wh
179180

180181
For CPU targets there is the IFeedbackTexture interface that requires an implemention for use. Slang does not currently include CPU implementations for texture types.
181182

183+
## RWByteAddressBuffer Atomic
184+
185+
Currently feature allows atomic float additions on RWByteAddressBuffer. A future update will broader types supported. There are methods on RWByteAddressBuffer...
186+
187+
```
188+
void RWByteAddressBuffer::InterlockedAddFp32(uint byteAddress, float valueToAdd, out float originalValue);
189+
void RWByteAddressBuffer::InterlockedAddFp32(uint byteAddress, float valueToAdd);
190+
```
191+
192+
On HLSL based targets this functionality is achieved using [nvAPI](https://developer.nvidia.com/nvapi) based functionality. Therefore for the feature to work you must have nvAPI installed on your system. Then the 'prelude' functionality allows via the API for an include (or the text) of the relevent files. To see how to do this in practice look at the function `setSessionDefaultPrelude`. This makes the prelude for HLSL hold an include to the *absolute* path to the required include file `nvHLSLExtns.h`. As an absolute path is used, it means other includes that includes, look in the correct place without having to set up special include paths.
193+
194+
To use nvAPI it is nessary to specify a unordered access views (UAV) based 'u' register that will be used to communicate with nvAPI. Note! Slang does not do any special handling around this, it will be necessary for application code to ensure the UAV is either guarenteed to not collide with what Slang assigns, or it's specified (but not used) in the Slang source. The u register number has to be specified also to the nvAPI runtime library.
195+
196+
On Vulkan, the [`GL_EXT_shader_atomic_float`](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_shader_atomic_float.html) extension is required.
197+
198+
199+

prelude/slang-cuda-prelude.h

+7
Original file line numberDiff line numberDiff line change
@@ -472,6 +472,13 @@ struct RWByteAddressBuffer
472472
*(T*)((char*)data + offset) = value;
473473
}
474474

475+
/// Can be used in stdlib to gain access
476+
SLANG_CUDA_CALL uint* _getPtrAt(size_t offset)
477+
{
478+
SLANG_PRELUDE_ASSERT(offset + sizeof(T) <= sizeInBytes && (offset & (alignof(T)-1)) == 0);
479+
return (uint*)(((char*)data) + offset);
480+
}
481+
475482
uint32_t* data;
476483
size_t sizeInBytes; //< Must be multiple of 4
477484
};

source/core/slang-test-tool-util.cpp

+35-10
Original file line numberDiff line numberDiff line change
@@ -75,29 +75,54 @@ static SlangResult _addCUDAPrelude(const String& parentPath, slang::IGlobalSessi
7575
return SLANG_OK;
7676
}
7777

78-
/* static */SlangResult TestToolUtil::setSessionDefaultPrelude(const char* exePath, slang::IGlobalSession* session)
78+
/* static */SlangResult TestToolUtil::setSessionDefaultPrelude(const PreludeInfo& info, slang::IGlobalSession* session)
7979
{
8080
// Set the prelude to a path
81-
String canonicalPath;
82-
if (SLANG_SUCCEEDED(Path::getCanonical(exePath, canonicalPath)))
81+
if (info.exePath)
8382
{
84-
// Get the directory
85-
String parentPath = Path::getParentDirectory(canonicalPath);
83+
String exePath(info.exePath);
8684

87-
if (SLANG_FAILED(_addCPPPrelude(parentPath, session)))
85+
String canonicalPath;
86+
if (SLANG_SUCCEEDED(Path::getCanonical(exePath, canonicalPath)))
8887
{
89-
SLANG_ASSERT(!"Couldn't find the C++ prelude relative to the executable");
88+
// Get the directory
89+
String parentPath = Path::getParentDirectory(canonicalPath);
90+
91+
if (SLANG_FAILED(_addCPPPrelude(parentPath, session)))
92+
{
93+
SLANG_ASSERT(!"Couldn't find the C++ prelude relative to the executable");
94+
}
95+
96+
if (SLANG_FAILED(_addCUDAPrelude(parentPath, session)))
97+
{
98+
SLANG_ASSERT(!"Couldn't find the CUDA prelude relative to the executable");
99+
}
90100
}
91-
92-
if (SLANG_FAILED(_addCUDAPrelude(parentPath, session)))
101+
}
102+
// If the nvAPI path is set, and we find nvHLSLExtns.h, put that in the HLSL prelude
103+
if (info.nvAPIPath)
104+
{
105+
String includePath;
106+
if (SLANG_SUCCEEDED(_calcIncludePath(info.nvAPIPath, "nvHLSLExtns.h", includePath)))
93107
{
94-
SLANG_ASSERT(!"Couldn't find the CUDA prelude relative to the executable");
108+
StringBuilder buf;
109+
110+
buf << "#include \"" << includePath << "\"\n";
111+
112+
session->setLanguagePrelude(SLANG_SOURCE_LANGUAGE_HLSL, buf.getBuffer());
113+
return SLANG_OK;
95114
}
96115
}
97116

98117
return SLANG_OK;
99118
}
100119

120+
/* static */SlangResult TestToolUtil::setSessionDefaultPrelude(const char* exePath, slang::IGlobalSession* session)
121+
{
122+
PreludeInfo info;
123+
info.exePath = exePath;
124+
return setSessionDefaultPrelude(info, session);
125+
}
101126

102127
}
103128

source/core/slang-test-tool-util.h

+8
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,12 @@ enum class ToolReturnCodeSpan
3636
/* Utility functions for 'test tools' */
3737
struct TestToolUtil
3838
{
39+
struct PreludeInfo
40+
{
41+
const char* exePath = nullptr;
42+
const char* nvAPIPath = nullptr;
43+
};
44+
3945
typedef SlangResult(*InnerMainFunc)(Slang::StdWriters* stdWriters, SlangSession* session, int argc, const char*const* argv);
4046

4147
/// If the test failed to run or was ignored then we are done
@@ -48,6 +54,8 @@ struct TestToolUtil
4854
static ToolReturnCode getReturnCode(SlangResult res);
4955

5056
/// Sets the default preludes on the session based on the executable path
57+
static SlangResult setSessionDefaultPrelude(const PreludeInfo& preludeInfo, slang::IGlobalSession* session);
58+
5159
static SlangResult setSessionDefaultPrelude(const char* exePath, slang::IGlobalSession* session);
5260
};
5361

source/slang/hlsl.meta.slang

+62
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,29 @@ struct ByteAddressBuffer
4848
}
4949
};
5050

51+
52+
// Make the GLSL atomicAdd available.
53+
// We have separate int/float implementations, as the float version requires some specific extensions
54+
// https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_atomic_float.txt
55+
56+
__target_intrinsic(glsl, "atomicAdd($0, $1)")
57+
__glsl_version(430)
58+
__glsl_extension(GL_EXT_shader_atomic_float)
59+
//__glsl_extension(GL_EXT_gpu_shader5)
60+
float __atomicAdd(__ref float value, float amount);
61+
62+
// Int versions require glsl 4.30
63+
// https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/atomicAdd.xhtml
64+
65+
__target_intrinsic(glsl, "atomicAdd($0, $1)")
66+
__glsl_version(430)
67+
int __atomicAdd(__ref int value, int amount);
68+
69+
__target_intrinsic(glsl, "atomicAdd($0, $1)")
70+
__glsl_version(430)
71+
uint __atomicAdd(__ref uint value, uint amount);
72+
73+
5174
__intrinsic_op($(kIROp_ByteAddressBufferLoad))
5275
T __byteAddressBufferLoad<T>(ByteAddressBuffer buffer, int offset);
5376

@@ -159,6 +182,41 @@ struct $(item.name)
159182
{
160183
return __byteAddressBufferLoad<T>(this, location);
161184
}
185+
${{{{
186+
if (item.op == kIROp_HLSLRWByteAddressBufferType)
187+
{
188+
}}}}
189+
190+
// float32 and int64 atomic support. This is a Slang specific extension, it uses
191+
// GL_EXT_shader_atomic_float on vk
192+
// NvAPI support on DX
193+
// NOTE! To use this feature on HLSL, the shader needs to include 'nvHLSLExtns.h' from the NvAPI SDK
194+
//
195+
__target_intrinsic(hlsl, "($3 = NvInterlockedAddFp32($0, $1, $2))")
196+
__target_intrinsic(cuda, "(*$3 = atomicAdd((float*)$0._getPtrAt($1), $2))")
197+
void InterlockedAddFp32(uint byteAddress, float valueToAdd, out float originalValue);
198+
199+
__specialized_for_target(glsl)
200+
void InterlockedAddFp32(uint byteAddress, float valueToAdd, out float originalValue)
201+
{
202+
RWStructuredBuffer<float> buf = __getEquivalentStructuredBuffer<float>(this);
203+
originalValue = __atomicAdd(buf[byteAddress / 4], valueToAdd);
204+
}
205+
206+
__target_intrinsic(hlsl, "(NvInterlockedAddFp32($0, $1, $2))")
207+
__target_intrinsic(cuda, "atomicAdd((float*)$0._getPtrAt($1), $2)")
208+
void InterlockedAddFp32(uint byteAddress, float valueToAdd);
209+
210+
__specialized_for_target(glsl)
211+
void InterlockedAddFp32(uint byteAddress, float valueToAdd)
212+
{
213+
RWStructuredBuffer<float> buf = __getEquivalentStructuredBuffer<float>(this);
214+
__atomicAdd(buf[byteAddress / 4], valueToAdd);
215+
}
216+
217+
${{{{
218+
}
219+
}}}}
162220

163221
// Added operations:
164222

@@ -1091,6 +1149,10 @@ T dot(vector<T, N> x, vector<T, N> y)
10911149

10921150
__generic<T : __BuiltinFloatingPointType> vector<T,4> dst(vector<T,4> x, vector<T,4> y);
10931151

1152+
// Given a RWByteAddressBuffer allow it to be interpretted as a RWStructuredBuffer
1153+
__intrinsic_op($(kIROp_GetEquivalentStructuredBuffer))
1154+
RWStructuredBuffer<T> __getEquivalentStructuredBuffer<T>(RWByteAddressBuffer b);
1155+
10941156
// Error message
10951157

10961158
// void errorf( string format, ... );

source/slang/slang-check-decl.cpp

+79-12
Original file line numberDiff line numberDiff line change
@@ -3520,6 +3520,7 @@ namespace Slang
35203520
return subst;
35213521
}
35223522

3523+
#if 0
35233524
// For simplicity we will make having a definition of a function include having a body or a target intrinsics defined.
35243525
// It may be useful to add other modifiers to mark as having body - for example perhaps
35253526
// any target intrinsic modifier (like SPIR-V version) should be included.
@@ -3536,6 +3537,40 @@ namespace Slang
35363537
{
35373538
return decl->body || decl->hasModifier<TargetIntrinsicModifier>();
35383539
}
3540+
#endif
3541+
3542+
typedef Dictionary<Name*, CallableDecl*> TargetDeclDictionary;
3543+
3544+
static void _addTargetModifiers(CallableDecl* decl, TargetDeclDictionary& ioDict)
3545+
{
3546+
if (auto specializedModifier = decl->findModifier<SpecializedForTargetModifier>())
3547+
{
3548+
// If it's specialized for target it should have a body...
3549+
if (auto funcDecl = as<FunctionDeclBase>(decl))
3550+
{
3551+
SLANG_ASSERT(funcDecl->body);
3552+
}
3553+
Name* targetName = specializedModifier->targetToken.getName();
3554+
3555+
ioDict.AddIfNotExists(targetName, decl);
3556+
}
3557+
else
3558+
{
3559+
for (auto modifier : decl->getModifiersOfType<TargetIntrinsicModifier>())
3560+
{
3561+
Name* targetName = modifier->targetToken.getName();
3562+
ioDict.AddIfNotExists(targetName, decl);
3563+
}
3564+
3565+
auto funcDecl = as<FunctionDeclBase>(decl);
3566+
if (funcDecl && funcDecl->body)
3567+
{
3568+
// Should only be one body if it isn't specialized for target.
3569+
// Use nullptr for this scenario
3570+
ioDict.AddIfNotExists(nullptr, decl);
3571+
}
3572+
}
3573+
}
35393574

35403575
Result SemanticsVisitor::checkFuncRedeclaration(
35413576
FuncDecl* newDecl,
@@ -3701,23 +3736,55 @@ namespace Slang
37013736
// with the case where the two function declarations
37023737
// might represent different target-specific versions
37033738
// of a function.
3704-
//
3705-
// TODO: if the two declarations are specialized for
3706-
// different targets, then skip the body checks below.
3707-
//
3708-
// ???: Why isn't this problem showing up in practice?
3709-
3739+
37103740
// If both of the declarations have a body, then there
37113741
// is trouble, because we wouldn't know which one to
37123742
// use during code generation.
3713-
if (_isDefinition(newDecl) && _isDefinition(oldDecl))
3743+
3744+
// Here to cover the 'bodies'/target_intrinsics, we find all the targets that
3745+
// that are previously defined, and make sure the new definition
3746+
// doesn't try and define what is already defined.
37143747
{
3715-
// Redefinition
3716-
getSink()->diagnose(newDecl, Diagnostics::functionRedefinition, newDecl->getName());
3717-
getSink()->diagnose(oldDecl, Diagnostics::seePreviousDefinitionOf, newDecl->getName());
3748+
TargetDeclDictionary currentTargets;
3749+
{
3750+
CallableDecl* curDecl = newDecl->primaryDecl;
3751+
while (curDecl)
3752+
{
3753+
if (curDecl != newDecl)
3754+
{
3755+
_addTargetModifiers(curDecl, currentTargets);
3756+
}
3757+
curDecl = curDecl->nextDecl;
3758+
}
3759+
}
37183760

3719-
// Don't bother emitting other errors
3720-
return SLANG_FAIL;
3761+
// Add the targets for this new decl
3762+
TargetDeclDictionary newTargets;
3763+
_addTargetModifiers(newDecl, newTargets);
3764+
3765+
bool hasConflict = false;
3766+
for (auto& pair : newTargets)
3767+
{
3768+
Name* target = pair.Key;
3769+
auto found = currentTargets.TryGetValue(target);
3770+
if (found)
3771+
{
3772+
// Redefinition
3773+
if (!hasConflict)
3774+
{
3775+
getSink()->diagnose(newDecl, Diagnostics::functionRedefinition, newDecl->getName());
3776+
hasConflict = true;
3777+
}
3778+
3779+
auto prevDecl = *found;
3780+
getSink()->diagnose(prevDecl, Diagnostics::seePreviousDefinitionOf, prevDecl->getName());
3781+
}
3782+
}
3783+
3784+
if (hasConflict)
3785+
{
3786+
return SLANG_FAIL;
3787+
}
37213788
}
37223789

37233790
// At this point we've processed the redeclaration and

0 commit comments

Comments
 (0)