Skip to content

Commit 281c67b

Browse files
author
Tim Foley
authored
Confirm layout for structured buffer of matrices, and related fixes (shader-slang#889)
* Fix up handling of NumElements for D3D buffer views For everything but structured buffers we'd been setting this to the size in bytes, but that isn't really valid at all. The `NumElements` member in the view descs is supposed to be the number of buffer elements, so it would be capped at the byte size divided by the element size. This change fixes the computation of `NumElements` to take the size of the format into account for non-structured views. For the "raw" case, we use a size of 4 bytes since that matches the `DXGI_FORMAT_R32_TYPELESS` format we use (which seems to be required for raw buffers). I also added support for the raw case for SRVs where it didn't seem to be supported before (not that any of our tests cover it). * Fix handling of size padding for D3D11 buffers The existing code was enforcing a 256-byte-aligned size for all buffers, but this can cause problems for a structured buffer. A structured buffer must have a size that is a multiple of the stride, so a structured buffer with a 48-byte stride and a 96-byte size would get rounded up to 256 bytes, which is not an integer multiple of 48. This change makes it so that we only apply the padding to constant buffers. According to MSDN, constant buffers only require padding to a 16-byte aligned size, and no other restrictions are listed for D3D11, but it is difficult to know whether those constrains are exhaustive. I've left in the 256-byte padding for now (rather than switch to 16-byte), even though I suspect that was only needed as a band-aid for the `NumElements` issue fixed by another commit. * Fix an IR generation bug when indxing into a strutured buffer element The problem here arises when we have a structured buffer of matrices (an array type would likely trigger it too): ```hlsl RWStructuredBuffer<float3x4> gMatrices; ``` and then we index into it directly, rather than copying to a temporary: ```hlsl // CRASH: float v = gMatrices[i][j][k]; // OKAY: float3x4 m = gMatrices[i]; float v = m[j][k]; ``` The underlying issue is that our IR lowering pass tries to defer the decision about whether to use a `get` vs. `set` vs. `ref` accessor for a subscript until as late as possible (this is to deal with the fact that sometimes D3D can provide a `ref` accessor where GLSL can only provide a `get` or `set`). We probably need to overhaul that aspect of IR codegen sooner or later, but this change uses some of the existing machinery to try to force the `gMatrices[i]` subexpression to take the form of a pointer when doing sub-indexing like this. This fixes the present case, and hopefully shouldn't break anything else that used to work (because the subroutines I'm using to coerce the `gMatrices[i]` expression should be idempotent on the cases that were already implemented). * Add a test case to confirm fxc/dxc layout for structured buffers of matrices Even when row-major layout is requested globally, fxc and dxc seem to lay out a `StructuredBuffer<float3x4>` with column-major layout on the elements. This commit adds a test that confirms that behavior. This commit does not try to implement a fix for the issue (either fixing Slang's layout reflection information to be correct for what fxc/dxc do in practice, or fixing Slang's HLSL output to work around the fxc/dxc behavior), but just documents the status quo. If/when we decide how we'd like to handle the issue long-term, this test can/should be updated to match the decision we make. * fixup: build breakage on clang/gcc This is one of those cases where the Microsoft compiler is letting through some stuff that isn't technically valid C++ ("delayed template parsing"). Fixed by just moving some declarations to earlier in the file.
1 parent 4aab9cc commit 281c67b

5 files changed

+163
-39
lines changed

source/slang/lower-to-ir.cpp

+48-29
Original file line numberDiff line numberDiff line change
@@ -1847,6 +1847,43 @@ void addArgs(
18471847
}
18481848
}
18491849

1850+
//
1851+
1852+
// When we try to turn a `LoweredValInfo` into an address of some temporary storage,
1853+
// we can either do it "aggressively" or not (what we'll call the "default" behavior,
1854+
// although it isn't strictly more common).
1855+
//
1856+
// The case that this is mostly there to address is when somebody writes an operation
1857+
// like:
1858+
//
1859+
// foo[a] = b;
1860+
//
1861+
// In that case, we might as well just use the `set` accessor if there is one, rather
1862+
// than complicate things. However, in more complex cases like:
1863+
//
1864+
// foo[a].x = b;
1865+
//
1866+
// there is no way to satisfy the semantics of the code the user wrote (in terms of
1867+
// only writing one vector component, and not a full vector) by using the `set`
1868+
// accessor, and we need to be "aggressive" in turning the lvalue `foo[a]` into
1869+
// an address.
1870+
//
1871+
// TODO: realistically IR lowering is too early to be binding to this choice,
1872+
// because different accessors might be supported on different targets.
1873+
//
1874+
enum class TryGetAddressMode
1875+
{
1876+
Default,
1877+
Aggressive,
1878+
};
1879+
1880+
/// Try to coerce `inVal` into a `LoweredValInfo::ptr()` with a simple address.
1881+
LoweredValInfo tryGetAddress(
1882+
IRGenContext* context,
1883+
LoweredValInfo const& inVal,
1884+
TryGetAddressMode mode);
1885+
1886+
18501887
//
18511888

18521889
template<typename Derived>
@@ -2607,6 +2644,17 @@ struct ExprLoweringVisitorBase : ExprVisitor<Derived, LoweredValInfo>
26072644
IRInst* indexVal)
26082645
{
26092646
auto builder = getBuilder();
2647+
2648+
// The `tryGetAddress` operation will take a complex value representation
2649+
// and try to turn it into a single pointer, if possible.
2650+
//
2651+
baseVal = tryGetAddress(context, baseVal, TryGetAddressMode::Aggressive);
2652+
2653+
// The `materialize` operation should ensure that we only have to deal
2654+
// with the small number of base cases for lowered value representations.
2655+
//
2656+
baseVal = materialize(context, baseVal);
2657+
26102658
switch (baseVal.flavor)
26112659
{
26122660
case LoweredValInfo::Flavor::Simple:
@@ -3646,35 +3694,6 @@ static LoweredValInfo moveIntoMutableTemp(
36463694
return var;
36473695
}
36483696

3649-
// When we try to turn a `LoweredValInfo` into an address of some temporary storage,
3650-
// we can either do it "aggressively" or not (what we'll call the "default" behavior,
3651-
// although it isn't strictly more common).
3652-
//
3653-
// The case that this is mostly there to address is when somebody writes an operation
3654-
// like:
3655-
//
3656-
// foo[a] = b;
3657-
//
3658-
// In that case, we might as well just use the `set` accessor if there is one, rather
3659-
// than complicate things. However, in more complex cases like:
3660-
//
3661-
// foo[a].x = b;
3662-
//
3663-
// there is no way to satisfy the semantics of the code the user wrote (in terms of
3664-
// only writing one vector component, and not a full vector) by using the `set`
3665-
// accessor, and we need to be "aggressive" in turning the lvalue `foo[a]` into
3666-
// an address.
3667-
//
3668-
// TODO: realistically IR lowering is too early to be binding to this choice,
3669-
// because different accessors might be supported on different targets.
3670-
//
3671-
enum class TryGetAddressMode
3672-
{
3673-
Default,
3674-
Aggressive,
3675-
};
3676-
3677-
/// Try to coerce `inVal` into a `LoweredValInfo::ptr()` with a simple address.
36783697
LoweredValInfo tryGetAddress(
36793698
IRGenContext* context,
36803699
LoweredValInfo const& inVal,
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
// matrix-layout-structured-buffer.slang
2+
3+
// This test is set up to confirm that `StructuredBuffer` types are
4+
// always laid out column-major by fxc/dxc, even when row-major layout has been
5+
// requested globally.
6+
//
7+
// This behavior should be considered a bug in Slang because either:
8+
//
9+
// 1. we should report reflection layout information that acknowledges this behavior, or
10+
// 2. we should alter our HLSL output passed to fxc/dxc to provide consistent
11+
// behavior that matches our reflection data.
12+
//
13+
// For now this test exists to document the situation. It's output can/should
14+
// be updated if we decide to fix the underlying problem by taking option (2).
15+
//
16+
17+
//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -xslang -matrix-layout-row-major
18+
//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -xslang -matrix-layout-column-major
19+
20+
21+
//TEST_INPUT:ubuffer(data=[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23], stride=48):dxbinding(0),glbinding(0)
22+
RWStructuredBuffer<int3x4> gMatrices;
23+
24+
int test(int val)
25+
{
26+
int N = 256;
27+
28+
int tmp = 0;
29+
30+
tmp = tmp*N + gMatrices[val%2][(val )%3][(val )%4];
31+
tmp = tmp*N + gMatrices[val%2][(val+1)%3][(val )%4];
32+
tmp = tmp*N + gMatrices[val%2][(val )%3][(val+1)%4];
33+
tmp = tmp*N + val;
34+
tmp = tmp + 0x80000000;
35+
36+
return tmp;
37+
}
38+
39+
//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0], stride=4):dxbinding(0),glbinding(1),out
40+
RWStructuredBuffer<int> buffer;
41+
42+
[numthreads(12, 1, 1)]
43+
void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID)
44+
{
45+
uint tid = dispatchThreadID.x;
46+
47+
int val = tid;
48+
val = test(val);
49+
50+
buffer[tid] = val;
51+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
80010300
2+
90111301
3+
88060B02
4+
95160C03
5+
81020404
6+
910F1405
7+
86070906
8+
96170D07
9+
82000508
10+
8F101209
11+
87080A0A
12+
97150E0B

tools/gfx/render-d3d11.cpp

+37-8
Original file line numberDiff line numberDiff line change
@@ -778,8 +778,15 @@ Result D3D11Renderer::createBufferResource(Resource::Usage initialUsage, const B
778778
BufferResource::Desc srcDesc(descIn);
779779
srcDesc.setDefaults(initialUsage);
780780

781-
// Make aligned to 256 bytes... not sure why, but if you remove this the tests do fail.
782-
const size_t alignedSizeInBytes = D3DUtil::calcAligned(srcDesc.sizeInBytes, 256);
781+
auto d3dBindFlags = _calcResourceBindFlags(srcDesc.bindFlags);
782+
783+
size_t alignedSizeInBytes = srcDesc.sizeInBytes;
784+
785+
if(d3dBindFlags & D3D11_BIND_CONSTANT_BUFFER)
786+
{
787+
// Make aligned to 256 bytes... not sure why, but if you remove this the tests do fail.
788+
alignedSizeInBytes = D3DUtil::calcAligned(alignedSizeInBytes, 256);
789+
}
783790

784791
// Hack to make the initialization never read from out of bounds memory, by copying into a buffer
785792
List<uint8_t> initDataBuffer;
@@ -792,7 +799,7 @@ Result D3D11Renderer::createBufferResource(Resource::Usage initialUsage, const B
792799

793800
D3D11_BUFFER_DESC bufferDesc = { 0 };
794801
bufferDesc.ByteWidth = UINT(alignedSizeInBytes);
795-
bufferDesc.BindFlags = _calcResourceBindFlags(srcDesc.bindFlags);
802+
bufferDesc.BindFlags = d3dBindFlags;
796803
// For read we'll need to do some staging
797804
bufferDesc.CPUAccessFlags = _calcResourceAccessFlags(descIn.cpuAccessFlags & Resource::AccessFlag::Write);
798805
bufferDesc.Usage = D3D11_USAGE_DEFAULT;
@@ -1049,7 +1056,6 @@ Result D3D11Renderer::createBufferView(BufferResource* buffer, ResourceView::Des
10491056
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
10501057
uavDesc.Format = D3DUtil::getMapFormat(desc.format);
10511058
uavDesc.Buffer.FirstElement = 0;
1052-
uavDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes);
10531059

10541060
if(resourceDesc.elementSize)
10551061
{
@@ -1059,6 +1065,11 @@ Result D3D11Renderer::createBufferView(BufferResource* buffer, ResourceView::Des
10591065
{
10601066
uavDesc.Buffer.Flags |= D3D11_BUFFER_UAV_FLAG_RAW;
10611067
uavDesc.Format = DXGI_FORMAT_R32_TYPELESS;
1068+
uavDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes / 4);
1069+
}
1070+
else
1071+
{
1072+
uavDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes / RendererUtil::getFormatSize(desc.format));
10621073
}
10631074

10641075
ComPtr<ID3D11UnorderedAccessView> uav;
@@ -1077,16 +1088,34 @@ Result D3D11Renderer::createBufferView(BufferResource* buffer, ResourceView::Des
10771088
D3D11_SHADER_RESOURCE_VIEW_DESC srvDesc = {};
10781089
srvDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
10791090
srvDesc.Format = D3DUtil::getMapFormat(desc.format);
1080-
srvDesc.Buffer.ElementOffset = 0;
1081-
srvDesc.Buffer.ElementWidth = 1;
10821091
srvDesc.Buffer.FirstElement = 0;
1083-
srvDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes);
10841092

10851093
if(resourceDesc.elementSize)
10861094
{
1087-
srvDesc.Buffer.ElementWidth = resourceDesc.elementSize;
10881095
srvDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes / resourceDesc.elementSize);
10891096
}
1097+
else if(desc.format == Format::Unknown)
1098+
{
1099+
// We need to switch to a different member of the `union`,
1100+
// so that we can set the `BufferEx.Flags` member.
1101+
//
1102+
srvDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
1103+
1104+
// Because we've switched, we need to re-set the `FirstElement`
1105+
// field to be valid, since we can't count on all compilers
1106+
// to respect that `Buffer.FirstElement` and `BufferEx.FirstElement`
1107+
// alias in memory.
1108+
//
1109+
srvDesc.BufferEx.FirstElement = 0;
1110+
1111+
srvDesc.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
1112+
srvDesc.Format = DXGI_FORMAT_R32_TYPELESS;
1113+
srvDesc.BufferEx.NumElements = UINT(resourceDesc.sizeInBytes / 4);
1114+
}
1115+
else
1116+
{
1117+
srvDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes / RendererUtil::getFormatSize(desc.format));
1118+
}
10901119

10911120
ComPtr<ID3D11ShaderResourceView> srv;
10921121
SLANG_RETURN_ON_FAIL(m_device->CreateShaderResourceView(resourceImpl->m_buffer, &srvDesc, srv.writeRef()));

tools/gfx/render-d3d12.cpp

+15-2
Original file line numberDiff line numberDiff line change
@@ -2255,7 +2255,6 @@ Result D3D12Renderer::createBufferView(BufferResource* buffer, ResourceView::Des
22552255
uavDesc.ViewDimension = D3D12_UAV_DIMENSION_BUFFER;
22562256
uavDesc.Format = D3DUtil::getMapFormat(desc.format);
22572257
uavDesc.Buffer.FirstElement = 0;
2258-
uavDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes);
22592258

22602259
if(resourceDesc.elementSize)
22612260
{
@@ -2266,6 +2265,11 @@ Result D3D12Renderer::createBufferView(BufferResource* buffer, ResourceView::Des
22662265
{
22672266
uavDesc.Buffer.Flags |= D3D12_BUFFER_UAV_FLAG_RAW;
22682267
uavDesc.Format = DXGI_FORMAT_R32_TYPELESS;
2268+
uavDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes / 4);
2269+
}
2270+
else
2271+
{
2272+
uavDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes / RendererUtil::getFormatSize(desc.format));
22692273
}
22702274

22712275

@@ -2284,13 +2288,22 @@ Result D3D12Renderer::createBufferView(BufferResource* buffer, ResourceView::Des
22842288
srvDesc.Format = D3DUtil::getMapFormat(desc.format);
22852289
srvDesc.Buffer.StructureByteStride = 0;
22862290
srvDesc.Buffer.FirstElement = 0;
2287-
srvDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes);
22882291

22892292
if(resourceDesc.elementSize)
22902293
{
22912294
srvDesc.Buffer.StructureByteStride = resourceDesc.elementSize;
22922295
srvDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes / resourceDesc.elementSize);
22932296
}
2297+
else if(desc.format == Format::Unknown)
2298+
{
2299+
srvDesc.Buffer.Flags |= D3D12_BUFFER_SRV_FLAG_RAW;
2300+
srvDesc.Format = DXGI_FORMAT_R32_TYPELESS;
2301+
srvDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes / 4);
2302+
}
2303+
else
2304+
{
2305+
srvDesc.Buffer.NumElements = UINT(resourceDesc.sizeInBytes / RendererUtil::getFormatSize(desc.format));
2306+
}
22942307

22952308
SLANG_RETURN_ON_FAIL(m_viewAllocator.allocate(&viewImpl->m_descriptor));
22962309
m_device->CreateShaderResourceView(resourceImpl->m_resource, &srvDesc, viewImpl->m_descriptor.cpuHandle);

0 commit comments

Comments
 (0)