You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix for writing to RWTexture with half types on CUDA.
* CUDA half functionality doc updates.
* First pass support for sust.p RWTexture format conversion on write.
* Tidy up implementation of $C.
Made clamping mode #define able.
* A simple test for RWTexture CUDA format conversion.
* Add support for float2 and float4.
* WIP conversion testing.
* Use $E to fix byte addressing in X in CUDA.
* Do not scale when accessing via _convert versions of surface functions.
* Revert to previous test.
* Test with half/float convert write/read.
* More broad half->float read conversion testing.
* Improve documentation around half and RWTexture conversion.
Copy file name to clipboardexpand all lines: docs/cuda-target.md
+51-6
Original file line number
Diff line number
Diff line change
@@ -20,11 +20,11 @@ These limitations apply to Slang transpiling to CUDA.
20
20
* Samplers are not separate objects in CUDA - they are combined into a single 'TextureObject'. So samplers are effectively ignored on CUDA targets.
21
21
* When using a TextureArray.Sample (layered texture in CUDA) - the index will be treated as an int, as this is all CUDA allows
22
22
* Care must be used in using `WaveGetLaneIndex` wave intrinsic - it will only give the right results for appropriate launches
23
-
* CUDA 'surfaces' are used for textures which are read/write. CUDA does NOT do format conversion with surfaces.
23
+
* CUDA 'surfaces' are used for textures which are read/write (aka RWTexture).
24
24
25
25
The following are a work in progress or not implemented but are planned to be so in the future
26
26
27
-
* Some resource types remain unsupported, and not all methods on types are supported
27
+
* Some resource types remain unsupported, and not all methods on all types are supported
28
28
29
29
# How it works
30
30
@@ -122,8 +122,6 @@ The UniformState and UniformEntryPointParams struct typically vary by shader. Un
122
122
size_t sizeInBytes;
123
123
```
124
124
125
-
126
-
127
125
## Texture
128
126
129
127
Read only textures will be bound as the opaque CUDA type CUtexObject. This type is the combination of both a texture AND a sampler. This is somewhat different from HLSL, where there can be separate `SamplerState` variables. This allows access of a single texture binding with different types of sampling.
@@ -138,11 +136,58 @@ Load is only supported for Texture1D, and the mip map selection argument is igno
138
136
139
137
RWTexture types are converted into CUsurfObject type.
140
138
141
-
In CUDA it is not possible to do a format conversion on an access to a CUsurfObject, so it must be backed by the same data format as is used within the Slang source code.
139
+
In regular CUDA it is not possible to do a format conversion on an access to a CUsurfObject. Slang does add support for hardware write conversions where they are available. To enable the feature it is necessary to attribute your RWTexture with `format`. For example
140
+
141
+
```
142
+
[format("rg16f")]
143
+
RWTexture2D<float2> rwt2D_2;
144
+
```
145
+
146
+
The format names used are the same as for (GLSL layout format types)[https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)]. If no format is specified Slang will *assume* that the format is the same as the type specified.
147
+
148
+
Note that the format attribution is on variables/paramters/fields and not part of the type system. This means that if you have a scenario like...
149
+
150
+
```
151
+
[format(rg16f)]
152
+
RWTexture2d<float2> g_texture;
153
+
154
+
float2 getValue(RWTexture2D<float2> t)
155
+
{
156
+
return t[int2(0, 0];
157
+
}
158
+
159
+
void doThing()
160
+
{
161
+
float2 v = getValue(g_texture);
162
+
}
163
+
```
164
+
165
+
Even `getValue` will receive t *without* the format attribute, and so will access it, presumably erroneously. A work around for this specific scenario would be to attribute the parameter
This will only work correctly if `getValue` is called with a `t` that has that format attribute. As it stands no checking is performed on this matching so no error or warning will be produced if there is a mismatch.
175
+
176
+
There is limited software support for doing a conversion on reading. Currently this only supports only 1D, 2D, 3D RWTexture, backed with half1, half2 or half4. For this path to work NVRTC must have the `cuda_fp16.h` and associated files available. Please check the section on `Half Support`.
177
+
178
+
If hardware read conversions are desired, this can be achieved by having a Texture<T> that uses the surface of a RWTexture<T>. Using the Texture<T> not only allows hardware conversion but also filtering.
142
179
143
180
It is also worth noting that CUsurfObjects in CUDA are NOT allowed to have mip maps.
144
181
145
-
By default surface access uses cudaBoundaryModeZero, this can be replaced using the macro SLANG_CUDA_BOUNDARY_MODE in the CUDA prelude.
182
+
By default surface access uses cudaBoundaryModeZero, this can be replaced using the macro SLANG_CUDA_BOUNDARY_MODE in the CUDA prelude. For HW format conversions the macro SLANG_PTX_BOUNDARY_MODE. These boundary settings are in effect global for the whole of the kernel.
183
+
184
+
`SLANG_CUDA_BOUNDARY_MODE` can be one of
185
+
186
+
* cudaBoundaryModeZero causes an execution trap on out-of-bounds addresses
187
+
* cudaBoundaryModeClamp stores data at the nearest surface location (sized appropriately)
188
+
* cudaBoundaryModeTrap drops stores to out-of-bounds addresses
189
+
190
+
`SLANG_PTX_BOUNDARY_MODE` can be one of `trap`, `clamp` or `zero`. In general it is recommended to have both set to the same type of value, for example `cudaBoundaryModeZero` and `zero`.
SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert<float>(float v, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode)
SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert<float2>(float2 v, cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode)
SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert<float2>(float2 v, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode)
SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert<float4>(float4 v, cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode)
SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert<float4>(float4 v, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode)
// The VK back-end gets away with this kind of coincidentally, since the "legalization" we have to do for resources means that there wouldn't be a single f() function any more.
102
104
// But for CUDA and C++ that's not the case or generally desirable.
0 commit comments