Skip to content

Commit 35dfe37

Browse files
author
Tim Foley
authored
Fix some subtle bugs in D3D constant buffer layout (shader-slang#771)
* Fix some subtle bugs in D3D constant buffer layout The root of the issue here is that the D3D constant buffer layout rules require 16-byte alignment for arrays and structures, but they do *not* round up the size of an array/structure type to account for that alignment. That means that in cases like the following: ```hlsl cbuffer C0 { float3 a[2]; float c0; } struct A { float4 x; float3 y; }; cbuffer C1 { A a; float c1; } ``` The `c0` and `c1` fields get an offset of 28 and not 32 like you might expect if the preceding array/structure field `a` had been padded out to match its 16-byte alignment. The actual fix here is relatively simple, and mostly amount to shuffling around some code in `type-layout.cpp` to ensure that the D3D constant buffer layout don't inherit the logic that was rounding up array/structure sizes. Along the way I took the opportunity to clean up the inheritance hierarchy by making the GLSL-family layout rules not try to share anythign with the D3D family (not that there is very little to share), which in turn allowed for some simplification of the GLSL side of things. Fixing this behavior changed the output of a few reflection tests. In the case of `tests/reflection/arrays.hlsl` the output confirmed that we had been producing bad reflection information in these kinds of cases. The output for `tests/reflection/matrix-layout.slang` also showed some bugs in our reflection, but these were overall more minor: we mis-reported the size of certain matrices as 64 bytes instead of 60, and as a result also computed the size of the overall constant buffer as 4 bytes bigger than needed. In all of these cases I double-checked the expected output against dxc to make sure that the new offsets/sizes are what we should have been producing in the first place. I also updated the reflection test harness to start outputting layout information for the element type of a structured buffer, which changed the output of `tests/reflection/structured-buffer.slang`, but this didn't show any change in what we reported: it is just information that wasn't in the output to begin with. Finally, I added two new tests around these subtle cases of buffer layout behavior (especially subtle because it varies across target APIs). The `tests/compute/buffer-layout.slang` test simply sets up a type to ilustrate the troublesome scenarios and then embeds it in both a constant buffer and structured buffer that will be backed by memory with sequential `int` values. We then read out the value of a field as a way to probe its de facto *offset* at runtime. This test doesn't really stress the Slang compiler (except for our ability to pass through the same type declarations to downstream compilers), but it is useful to confirm our expectations about where things land in memory. The `tests/reflection/buffer-layout.slang` test then uses the reflection test infrastructure to confirm that the same type declarations used in the compute test produce the expected offsets in our reported reflection information. Before the fixes in this change this test showed us producing dangerously incorrect results in our D3D reflection information, which has now been fixed to match the empirically-determined offsets from the compute test. * fixups based on review feedback
1 parent 7ba0a80 commit 35dfe37

14 files changed

+944
-144
lines changed

source/slang/type-layout.cpp

+171-63
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,36 @@ struct DefaultLayoutRulesImpl : SimpleLayoutRulesImpl
7878
SimpleArrayLayoutInfo GetArrayLayout( SimpleLayoutInfo elementInfo, LayoutSize elementCount) override
7979
{
8080
SLANG_RELEASE_ASSERT(elementInfo.size.isFinite());
81-
auto stride = elementInfo.size.getFiniteValue();
81+
auto elementSize = elementInfo.size.getFiniteValue();
82+
auto elementAlignment = elementInfo.alignment;
83+
auto elementStride = RoundToAlignment(elementSize, elementAlignment);
84+
85+
// An array with no elements will have zero size.
86+
//
87+
LayoutSize arraySize = 0;
88+
//
89+
// Any array with a non-zero number of elements will need
90+
// to have space for N elements of size `elementSize`, with
91+
// the constraints that there must be `elementStride` bytes
92+
// between consecutive elements.
93+
//
94+
if( elementCount > 0 )
95+
{
96+
// We can think of this as either allocating (N-1)
97+
// chunks of size `elementStride` (for most of the elements)
98+
// and then one final chunk of size `elementSize` for
99+
// the last element, or equivalently as allocating
100+
// N chunks of size `elementStride` and then "giving back"
101+
// the final `elementStride - elementSize` bytes.
102+
//
103+
arraySize = (elementStride * (elementCount-1)) + elementSize;
104+
}
82105

83106
SimpleArrayLayoutInfo arrayInfo;
84107
arrayInfo.kind = elementInfo.kind;
85-
arrayInfo.size = stride * elementCount;
86-
arrayInfo.alignment = elementInfo.alignment;
87-
arrayInfo.elementStride = stride;
108+
arrayInfo.size = arraySize;
109+
arrayInfo.alignment = elementAlignment;
110+
arrayInfo.elementStride = elementStride;
88111
return arrayInfo;
89112
}
90113

@@ -125,86 +148,179 @@ struct DefaultLayoutRulesImpl : SimpleLayoutRulesImpl
125148
if(fieldInfo.size == 0)
126149
return ioStructInfo->size;
127150

151+
// A struct type must be at least as aligned as its most-aligned field.
128152
ioStructInfo->alignment = std::max(ioStructInfo->alignment, fieldInfo.alignment);
129-
ioStructInfo->size = RoundToAlignment(ioStructInfo->size, fieldInfo.alignment);
130-
LayoutSize fieldOffset = ioStructInfo->size;
131-
ioStructInfo->size += fieldInfo.size;
153+
154+
// The new field will be added to the end of the struct.
155+
auto fieldBaseOffset = ioStructInfo->size;
156+
157+
// We need to ensure that the offset for the field will respect its alignment
158+
auto fieldOffset = RoundToAlignment(fieldBaseOffset, fieldInfo.alignment);
159+
160+
// The size of the struct must be adjusted to cover the bytes consumed
161+
// by this field.
162+
ioStructInfo->size = fieldOffset + fieldInfo.size;
163+
132164
return fieldOffset;
133165
}
134166

135167

136168
void EndStructLayout(UniformLayoutInfo* ioStructInfo) override
137169
{
170+
SLANG_UNUSED(ioStructInfo);
171+
172+
// Note: A traditional C layout algorithm would adjust the size
173+
// of a struct type so that it is a multiple of the alignment.
174+
// This is a parsimonious design choice because it means that
175+
// `sizeof(T)` can both be used when copying/allocating a single
176+
// value of type `T` or an array of N values, without having to
177+
// consider more details.
178+
//
179+
// Of course the choice also has down-sides in that wrapping things
180+
// into a `struct` can affect layout in ways that waste space. E.g.,
181+
// the following two cases don't lay out the same:
182+
//
183+
// struct S0 { double d; float f; float g; };
184+
//
185+
// struct X { double d; float f; }
186+
// struct S1 { X x; float g; }
187+
//
188+
// Even though `S0::g` and `S1::g` have the same amount of useful
189+
// data in front of them, they will not land at the same offset,
190+
// and the resulting struct sizes will differ (`sizeof(S0)` will be
191+
// 16 while `sizeof(S1)` will be 24).
192+
//
193+
// Slang doesn't get to be opinionated about this stuff because
194+
// there is already precedent in both HLSL and GLSL for types
195+
// that have a size that is not rounded up to their alignment.
196+
//
197+
// Our default layout rules won't implement the C-like policy,
198+
// and instead it will be injected in the concrete implementations
199+
// that require it.
200+
}
201+
};
202+
203+
/// Common behavior for GLSL-family layout.
204+
struct GLSLBaseLayoutRulesImpl : DefaultLayoutRulesImpl
205+
{
206+
typedef DefaultLayoutRulesImpl Super;
207+
208+
SimpleLayoutInfo GetVectorLayout(SimpleLayoutInfo elementInfo, size_t elementCount) override
209+
{
210+
// The `std140` and `std430` rules require vectors to be aligned to the next power of
211+
// two up from their size (so a `float2` is 8-byte aligned, and a `float3` is
212+
// 16-byte aligned).
213+
//
214+
// Note that in this case we have a type layout where the size is *not* a multiple
215+
// of the alignment, so it should be possible to pack a scalar after a `float3`.
216+
//
217+
SLANG_RELEASE_ASSERT(elementInfo.kind == LayoutResourceKind::Uniform);
218+
SLANG_RELEASE_ASSERT(elementInfo.size.isFinite());
219+
220+
auto size = elementInfo.size.getFiniteValue() * elementCount;
221+
SimpleLayoutInfo vectorInfo(
222+
LayoutResourceKind::Uniform,
223+
size,
224+
RoundUpToPowerOfTwo(size));
225+
return vectorInfo;
226+
}
227+
228+
SimpleArrayLayoutInfo GetArrayLayout( SimpleLayoutInfo elementInfo, LayoutSize elementCount) override
229+
{
230+
// The size of an array must be rounded up to be a multiple of its alignment.
231+
//
232+
auto info = Super::GetArrayLayout(elementInfo, elementCount);
233+
info.size = RoundToAlignment(info.size, info.alignment);
234+
return info;
235+
}
236+
237+
void EndStructLayout(UniformLayoutInfo* ioStructInfo) override
238+
{
239+
// The size of a `struct` must be rounded up to be a multiple of its alignment.
240+
//
138241
ioStructInfo->size = RoundToAlignment(ioStructInfo->size, ioStructInfo->alignment);
139242
}
140243
};
141244

142-
// Capture common behavior betwen HLSL and GLSL (`std140`) constnat buffer rules
143-
struct DefaultConstantBufferLayoutRulesImpl : DefaultLayoutRulesImpl
245+
/// The GLSL `std430` layout rules.
246+
struct Std430LayoutRulesImpl : GLSLBaseLayoutRulesImpl
144247
{
145-
// The `std140` rules require that all array elements
146-
// be a multiple of 16 bytes.
147-
//
148-
// HLSL agrees.
248+
// These rules don't actually need any differences from our
249+
// base/common GLSL layout rules.
250+
};
251+
252+
/// The GLSL `std430` layout rules.
253+
struct Std140LayoutRulesImpl : GLSLBaseLayoutRulesImpl
254+
{
255+
typedef GLSLBaseLayoutRulesImpl Super;
256+
149257
SimpleArrayLayoutInfo GetArrayLayout(SimpleLayoutInfo elementInfo, LayoutSize elementCount) override
150258
{
259+
// The `std140` rules require that array elements
260+
// be aligned on 16-byte boundaries.
261+
//
151262
if(elementInfo.kind == LayoutResourceKind::Uniform)
152263
{
153264
if (elementInfo.alignment < 16)
154265
elementInfo.alignment = 16;
155-
elementInfo.size = RoundToAlignment(elementInfo.size, elementInfo.alignment);
156266
}
157-
return DefaultLayoutRulesImpl::GetArrayLayout(elementInfo, elementCount);
267+
return Super::GetArrayLayout(elementInfo, elementCount);
158268
}
159269

160-
// The `std140` rules require that a `struct` type be
161-
// aligned to at least 16.
162-
//
163-
// HLSL agrees.
164270
UniformLayoutInfo BeginStructLayout() override
165271
{
272+
// The `std140` rules require that a `struct` type
273+
// be at least 16-byte aligned.
274+
//
166275
return UniformLayoutInfo(0, 16);
167276
}
168277
};
169278

170-
struct GLSLConstantBufferLayoutRulesImpl : DefaultConstantBufferLayoutRulesImpl
279+
struct HLSLConstantBufferLayoutRulesImpl : DefaultLayoutRulesImpl
171280
{
172-
};
281+
typedef DefaultLayoutRulesImpl Super;
173282

174-
// The `std140` and `std430` rules require vectors to be aligned to the next power of
175-
// two up from their size (so a `float2` is 8-byte aligned, and a `float3` is
176-
// 16-byte aligned).
177-
//
178-
// Note that in this case we have a type layout where the size is *not* a multiple
179-
// of the alignment, so it should be possible to pack a scalar after a `float3`.
180-
static SimpleLayoutInfo getGLSLVectorLayout(
181-
SimpleLayoutInfo elementInfo, size_t elementCount)
182-
{
183-
SLANG_RELEASE_ASSERT(elementInfo.kind == LayoutResourceKind::Uniform);
184-
SLANG_RELEASE_ASSERT(elementInfo.size.isFinite());
185-
186-
auto size = elementInfo.size.getFiniteValue() * elementCount;
187-
SimpleLayoutInfo vectorInfo(
188-
LayoutResourceKind::Uniform,
189-
size,
190-
RoundUpToPowerOfTwo(size));
191-
return vectorInfo;
192-
}
283+
// Similar to GLSL `std140` rules, an HLSL constant buffer requires that
284+
// `struct` and array types have 16-byte alignement.
285+
//
286+
// Unlike GLSL `std140`, the overall size of an array or `struct` type
287+
// is *not* rounded up to the alignment, so it is possible for later
288+
// fields to sneak into the "tail space" left behind by a preceding
289+
// structure or array. E.g., in this example:
290+
//
291+
// struct S { float3 a[2]; float b; };
292+
//
293+
// The stride of the array `a` is 16 bytes per element, but the size
294+
// of `a` will only be 28 bytes (not 32), so that `b` can fit into
295+
// the space after the last array element and the overall structure
296+
// will have a size of 32 bytes.
193297

194-
// The `std140` rules combine the GLSL-specific layout for 3-vectors with the
195-
// alignment padding for structures and arrays that is common to both HLSL
196-
// and GLSL constant buffers.
197-
struct Std140LayoutRulesImpl : GLSLConstantBufferLayoutRulesImpl
198-
{
199-
SimpleLayoutInfo GetVectorLayout(SimpleLayoutInfo elementInfo, size_t elementCount) override
298+
SimpleArrayLayoutInfo GetArrayLayout(SimpleLayoutInfo elementInfo, LayoutSize elementCount) override
200299
{
201-
return getGLSLVectorLayout(elementInfo, elementCount);
300+
if(elementInfo.kind == LayoutResourceKind::Uniform)
301+
{
302+
if (elementInfo.alignment < 16)
303+
elementInfo.alignment = 16;
304+
}
305+
return Super::GetArrayLayout(elementInfo, elementCount);
202306
}
203-
};
204307

205-
struct HLSLConstantBufferLayoutRulesImpl : DefaultConstantBufferLayoutRulesImpl
206-
{
207-
// Can't let a `struct` field straddle a register (16-byte) boundary
308+
UniformLayoutInfo BeginStructLayout() override
309+
{
310+
return UniformLayoutInfo(0, 16);
311+
}
312+
313+
// HLSL layout rules do *not* impose additional alignment
314+
// constraints on vectors (e.g., all of `float`, `float2`,
315+
// `float3`, and `float4` have 4-byte alignment), but instead
316+
// they impose a rule that any `struct` field must not
317+
// "straddle" a 16-byte boundary.
318+
//
319+
// This has the effect of making it *look* like `float4`
320+
// values have 16-byte alignment in practice, but the
321+
// effects on `float2` and `float3` are more nuanched and
322+
// lead to different result than the GLSL rules.
323+
//
208324
LayoutSize AddStructField(UniformLayoutInfo* ioStructInfo, UniformLayoutInfo fieldInfo) override
209325
{
210326
// Skip zero-size fields
@@ -234,18 +350,10 @@ struct HLSLConstantBufferLayoutRulesImpl : DefaultConstantBufferLayoutRulesImpl
234350

235351
struct HLSLStructuredBufferLayoutRulesImpl : DefaultLayoutRulesImpl
236352
{
237-
// TODO: customize these to be correct...
238-
};
239-
240-
// The `std430` rules don't include the array/structure alignment padding that
241-
// gets applied to constant buffers, but they do include the padding of 3-vectors
242-
// to be aligned as 4-vectors.
243-
struct Std430LayoutRulesImpl : DefaultLayoutRulesImpl
244-
{
245-
SimpleLayoutInfo GetVectorLayout(SimpleLayoutInfo elementInfo, size_t elementCount) override
246-
{
247-
return getGLSLVectorLayout(elementInfo, elementCount);
248-
}
353+
// HLSL structured buffers drop the restrictions added for constant buffers,
354+
// but retain the rules around not adjusting the size of an array or
355+
// structure to its alignment. In this way they should match our
356+
// default layout rules.
249357
};
250358

251359
struct DefaultVaryingLayoutRulesImpl : DefaultLayoutRulesImpl

source/slang/type-layout.h

+10
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,16 @@ inline LayoutSize maximum(LayoutSize left, LayoutSize right)
176176
right.getFiniteValue()));
177177
}
178178

179+
inline bool operator>(LayoutSize left, LayoutSize::RawValue right)
180+
{
181+
return left.isInfinite() || (left.getFiniteValue() > right);
182+
}
183+
184+
inline bool operator<=(LayoutSize left, LayoutSize::RawValue right)
185+
{
186+
return left.isFinite() && (left.getFiniteValue() <= right);
187+
}
188+
179189
// Layout appropriate to "just memory" scenarios,
180190
// such as laying out the members of a constant buffer.
181191
struct UniformLayoutInfo

0 commit comments

Comments
 (0)