NVAPI support doc (shader-slang#1574)

jsmall-zzz · web-flow · commit 66ab6f4ecbef · 2020-10-09T09:54:16.000-04:00
* #include an absolute path didn't work - because paths were taken to always be relative.

* Split out NVAPI documentation.
Attempt to describe updated usage.

* Discuss downstream compiler include paths issues.

* Fix links .

* Apparently github supports relative links...

* Fix typo.
diff --git a/docs/nvapi-support.md b/docs/nvapi-support.md
@@ -0,0 +1,85 @@
+NVAPI Support
+=============
+
+Slang provides support for [NVAPI](https://developer.nvidia.com/nvapi) in several ways
+
+* Slang allows the use of NVAPI directly, by the inclusion of the `#include "nvHLSLExtns.h"` header in your Slang code. Doing so will make all the NVAPI functions directly available and usabe within your Slang source code.
+* NVAPI is used to provide features implicitly for certain targets. For example support for [RWByteAddressBuffer atomics](target-compatibility.md) on HLSL based targets is supported currently via NVAPI.
+* Direct and implicit NVAPI usage can be freely mixed. 
+
+Direct usage of NVAPI
+=====================
+
+Direct usage of NVAPI just requires the inclusion of the appropriate NVAPI header, typically with `#include "nvHLSLExtns.h` within your Slang source. As is required by NVAPI before the `#include` it is necessary to specify the slot and perhaps space usage. For example a typical direct NVAPI usage inside a Slang source file might contain something like...
+
+```
+#define NV_SHADER_EXTN_SLOT u0 
+#include "nvHLSLExtns.h"
+```
+
+In order for the include to work, it is necessary for the include path to include the folder that contains the nvHLSLExtns.h and associated headers.
+
+Implicit usage of NVAPI
+=======================
+
+It is convenient and powerful to be able to directly use NVAPI calls, but will only work on such targets that support the mechansism, even if there is a way to support the functionality some other way.
+
+Slang provides some cross platform features on HLSL based targets that are implemented via NVAPI. For example RWByteAddressBuffer atomics are supported on Vulkan, DX12 and CUDA. On DX12 they are made available via NVAPI, whilst CUDA and Vulkan have direct support. When compiling Slang code that uses RWByteAddressBuffer atomics Slang will emit HLSL code that use NVAPI. In order for the downstream compiler to be able to compile this HLSL it must be able to include the NVAPI header `nvHLSLExtns.h`. 
+
+It worth discussing briefly how this mechanism works. Slang has a 'prelude' mechanism for different source targets. The prelude is a piece of text that is inserted before the source that is output from compiling the input Slang source code. There is a default prelude for HLSL that is something like 
+
+```
+#ifdef SLANG_HLSL_ENABLE_NVAPI
+#include "nvHLSLExtns.h"
+#endif
+```
+
+If there are any calls to NVAPI implicitly from Slang source, then the following is emitted before the prelude
+
+```
+#define SLANG_HLSL_ENABLE_NVAPI 1
+#define NV_SHADER_EXTN_SLOT u0
+#define NV_SHADER_EXTN_REGISTER_SPACE space0
+```
+
+Thus causing the prelude to include nvHLSLExtns.h, and specifying the slot and potentially the space as is required for inclusion of nvHLSLExtns.h.
+
+The actual values for the slot and optionally the space, are found by Slang examining the values of those values at the end of preprocessing input Slang source files. 
+
+This means that if compile Slang source that has implicit use NVAPI, the slot and optionally the space must be defined. This can be achieved with a command line -D, throught the API or through having suitable `#define`s in the Slang source code.
+
+It is worth noting if you *replace* the default HLSL prelude, and use NVAPI then it will be necessary to have something like the default HLSL prelude part of your custom prelude.
+
+Downstream Compiler Include
+---------------------------
+
+There is a subtle detail that is perhaps worth noting here around the downstream compiler and `#include`s. When Slang outputs HLSL it typically does not contain any `#include`, because all of the `#include` in the original source code have been handled by Slang. Slang then outputs everything required to compile to the downstream compiler *without* any `#include`. When NVAPI is used explicitly this is still the case - the NVAPI headers are consumed by Slang, and then Slang will output HLSL that does not contain any `#include`.
+
+The astute reader may have noticed that the new default Slang HLSL prelude *does* contain an include. So when outputs NVAPI calls from implicit use, this #include will be enabled.
+
+```
+#ifdef SLANG_HLSL_ENABLE_NVAPI
+#include "nvHLSLExtns.h"
+#endif
+```
+
+This means that the *downstream* compiler (such as DXC and FXC) must be able to handle this include. 
+
+As it turns out all the includes specified to Slang (via command line -I or through the API), are passed down to the include handlers for FXC and DXC. 
+
+In the simplest use case where the path to `nvHLSLExtns.h` is specified in the include paths everything should 'just work' - as both Slang and the downstream compilers will see these include paths and so can handle the include. 
+
+Things are more complicated if there is mixed implicit/explitic NVAPI usage and in the Slang source the include path is set up such that NVAPI is included with 
+
+```
+#include "nvapi/nvHLSLExtns.h"
+```
+
+This won't work directly with the implicit usage, as the downstream compiler includes as `"nvHLSLExtns.h"`. One way to work around this by altering the HLSL prelude such as the same `#include` is used. 
+
+Links
+-----
+
+More details on how this works can be found in the following PR
+
+* [Simplify workflow when using NVAPI #1556](https://github.com/shader-slang/slang/pull/1556)
diff --git a/docs/target-compatibility.md b/docs/target-compatibility.md
@@ -193,45 +193,20 @@ void RWByteAddressBuffer::InterlockedAddI64(uint byteAddress, int64_t valueToAdd
 
 void RWByteAddressBuffer::InterlockedCompareExchangeU64(uint byteAddress, uint64_t compareValue, uint64_t value, out uint64_t outOriginalValue);
 
+uint64_t RWByteAddressBuffer::InterlockedExchangeU64(uint byteAddress, uint64_t value);
+
 uint64_t RWByteAddressBuffer::InterlockedMaxU64(uint byteAddress, uint64_t value);
 uint64_t RWByteAddressBuffer::InterlockedMinU64(uint byteAddress, uint64_t value);
 
 uint64_t RWByteAddressBuffer::InterlockedAndU64(uint byteAddress, uint64_t value);
 uint64_t RWByteAddressBuffer::InterlockedOrU64(uint byteAddress, uint64_t value);
 uint64_t RWByteAddressBuffer::InterlockedXorU64(uint byteAddress, uint64_t value);
-```
-
-On HLSL based targets this functionality is achieved using [NVAPI](https://developer.nvidia.com/nvapi). For this to work it is necessary to have NVAPI available on your system. The 'prelude' functionality in the Slang API allows for text to be inserted before any Slang code generated code is output. If the input source uses an NVAPI feature - like the methods above - it will output code that *assumes* that `nvHLSLExtns.h` is included.  The following code from `render-test-main.cpp` sets up a suitable prelude for HLSL that includes `nvHLSLExtns.h` with an absolute path.
-
-```
-String rootPath;
-SLANG_RETURN_ON_FAIL(TestToolUtil::getRootPath(exePath, rootPath));
-
-String includePath;
-SLANG_RETURN_ON_FAIL(TestToolUtil::getIncludePath(rootPath, "external/nvapi/nvHLSLExtns.h", includePath));
 
-StringBuilder buf;
-// We have to choose a slot that NVAPI will use. 
-buf << "#define NV_SHADER_EXTN_SLOT " << options.nvapiRegister << "\n";
-
-// Include the NVAPI header
-buf << "#include \"" << includePath << "\"\n\n";
-
-session->setLanguagePrelude(SLANG_SOURCE_LANGUAGE_HLSL, buf.getBuffer());
-```        
-
-This sets the HLSL prelude to something like...
 
 ```
-#define NV_SHADER_EXTN_SLOT u0
-#include "d:/path/to/nvapi/nvHLSLExtns.h"
-```
-
-Note the use of the *absolute* path to the file `nvHLSLExtns.h`. Doing so means the other includes that `nvHLSLExtns.h` includes look in the correct place without having to set up special include paths. As is required by using NVAPI, before the include it is necessary to specify what UAV will be used. 
-
-To use NVAPI it is nessary to specify a unordered access views (UAV) based 'u' register that will be used to communicate with NVAPI. 
 
-Note! Slang does not do any special handling around this, it will be necessary for application code to ensure the UAV is either guarenteed to not collide with what Slang assigns, or it's specified (but not used) in the Slang source. The u register number has to be specified also to the NVAPI runtime library. 
+On HLSL based targets this functionality is achieved using [NVAPI](https://developer.nvidia.com/nvapi). Support for NVAPI is described
+in the separate [NVAPI Support](nvapi-support.md) document.  
 
 On Vulkan, for float the [`GL_EXT_shader_atomic_float`](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_shader_atomic_float.html) extension is required. For int64 the [`GL_EXT_shader_atomic_int64`](https://raw.githubusercontent.com/KhronosGroup/GLSL/master/extensions/ext/GL_EXT_shader_atomic_int64.txt) extension is required.