Skip to content

Commit 40d8f3a

Browse files
authored
CPU Performance/Testing improvements (shader-slang#1055)
* First pass of render-test refactor. * Make window construction a function that can choose an implementation. * Remove OpenGL as currently has windows dependency. * Disable Vulkan as Renderer impl has dependency on windows. * Pass Window in as parameter of 'update'. * Add win-window.cpp as was missing. * Fix warning on windows about signs during comparison. * * Added mechanism to add random arrays as buffer inputs and select type * Improved RenderGenerator to generate more types, and to be more careful around int32 ranges. * Added support for security checks (for Visual Studio C++) * Disable Execption handling being on by default when compiling kernels * Added a 'Group' version of the entry point that will evaluate all threads in a group in a single call. In test code use this method if available. * Added -compile-arg to be able to pass arguments to the compile within render-test * Add documention for the _Group execution feature. * Fix some typos in cpu-target.md
1 parent c2e5d24 commit 40d8f3a

18 files changed

+469
-83
lines changed

docs/cpu-target.md

+17
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,23 @@ When compiled into a shared library/dll - how is it invoked? The entry point is
112112
void computeMain(ComputeVaryingInput* varyingInput, UniformEntryPointParams* uniformParams, UniformState* uniformState);
113113
```
114114

115+
116+
If compiled with `SLANG_HOST_CALLABLE` the `ISlangSharedLibrary` will export a function named `computeMain` the same name as the entry point in the original source.
117+
118+
ComputeVaryingInput is defined in the prelude as
119+
120+
```
121+
struct ComputeVaryingInput
122+
{
123+
uint3 groupID;
124+
uint3 groupThreadID;
125+
};
126+
```
127+
128+
Typically when invoking the kernel it is a question of updating the groupID/groupThreadID, to specify which 'thread' of the computation to execute. For the example above we have `[numthreads(4, 1, 1)]`. This means groupThreadID.x can vary from 0-3 and .y and .z must be 0. That groupID.x indicates which 'group of 4' to execute. So groupID.x = 1, with groupThreadID.x=0,1,2,3 runs the 4th, 5th, 6th and 7th 'thread'. Being able to invoke each thread in this way is flexible - in that any specific thread can specified and executed. It is not necessarily very efficient because there is the call overhead and a small amount of extra work that is performed inside the kernel.
129+
130+
For improved performance there is a mechanism to execute a 'thread group' all in a single invocation. A function with the same signature will be exposed with the entry point name postfixed with `_Group` - in the example above the function would be called 'computeMain_Group'. When calling this function only the groupID need be specified, the groupThreadID is ignored. All of the threads within the group (as specified by `[numthreads]`) will be executed in a single call.
131+
115132
The UniformState and UniformEntryPointParams struct typically vary by shader. UniformState holds 'normal' bindings, whereas UniformEntryPointParams hold the uniform entry point parameters. Where specific bindings or parameters are located can be determined by reflection. The structures for the example above would be something like the following...
116133

117134
```

source/core/slang-cpp-compiler.h

+2-1
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,8 @@ class CPPCompiler: public RefObject
9595
enum Enum : Flags
9696
{
9797
EnableExceptionHandling = 0x01,
98-
Verbose = 0x02,
98+
Verbose = 0x02,
99+
EnableSecurityChecks = 0x04,
99100
};
100101
};
101102

source/core/slang-random-generator.cpp

+21-3
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,33 @@ int64_t RandomGenerator::nextInt64()
3232
return (int64_t(high) << 32) | low;
3333
}
3434

35-
int32_t RandomGenerator::nextInt32InRange(int32_t min, int32_t max)
35+
uint32_t RandomGenerator::nextUInt32InRange(uint32_t min, uint32_t max)
3636
{
37-
int32_t diff = max - min;
37+
// Make sure max is at least in
38+
max = (max >= min) ? max : min;
39+
40+
// Make 64 bit so can be lazier than having to take care of 32 bit overflow/underflow issues
41+
uint32_t diff = max - min;
3842
if (diff <= 1)
3943
{
4044
return min;
4145
}
46+
return (nextUInt32() % diff) + min;
47+
}
4248

43-
return (nextPositiveInt32() % diff) + min;
49+
50+
int32_t RandomGenerator::nextInt32InRange(int32_t min, int32_t max)
51+
{
52+
// Make sure max is at least in
53+
max = (max >= min) ? max : min;
54+
55+
// Make 64 bit so can be lazier than having to take care of 32 bit overflow/underflow issues
56+
uint32_t diff = uint32_t(int64_t(max) - int64_t(min));
57+
if (diff <= 1)
58+
{
59+
return min;
60+
}
61+
return int32_t(int64_t(nextUInt32() % diff) + min);
4462
}
4563

4664
int64_t RandomGenerator::nextInt64InRange(int64_t min, int64_t max)

source/core/slang-random-generator.h

+7-1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ class RandomGenerator: public RefObject
3030
/// Get the next bool
3131
virtual bool nextBool();
3232

33+
/// Next uint32_t
34+
uint32_t nextUInt32() { return uint32_t(nextInt32()); }
35+
3336
/// Next Int32 which can only be positive
3437
int32_t nextPositiveInt32() { return nextInt32() & 0x7fffffff; }
3538
/// Next Int64 which can only be positive
@@ -38,9 +41,12 @@ class RandomGenerator: public RefObject
3841
/// Returns value up to BUT NOT INCLUDING maxValue.
3942
int32_t nextInt32UpTo(int32_t maxValue) { assert(maxValue > 0); return (maxValue <= 1) ? 0 : (nextPositiveInt32() % maxValue); }
4043

41-
/// Returns value from min up to BUT NOT INCLUDING max
44+
/// Returns value from min up to BUT NOT INCLUDING max.
4245
int32_t nextInt32InRange(int32_t min, int32_t max);
4346

47+
/// Returns value from min up to BUT NOT INCLUDING max
48+
uint32_t nextUInt32InRange(uint32_t min, uint32_t max);
49+
4450
/// Returns value up to BUT NOT INCLUDING maxValue
4551
int64_t nextInt64UpTo(int64_t maxValue) { assert(maxValue > 0); return (maxValue <= 1) ? 0 : (nextPositiveInt64() % maxValue); }
4652

source/core/slang-visual-studio-compiler-util.cpp

+9
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,15 @@ namespace Slang
9595
// Doesn't appear to be a VS equivalent
9696
}
9797

98+
if (options.flags & CompileOptions::Flag::EnableSecurityChecks)
99+
{
100+
cmdLine.addArg("/GS");
101+
}
102+
else
103+
{
104+
cmdLine.addArg("/GS-");
105+
}
106+
98107
switch (options.debugInfoType)
99108
{
100109
default:

source/slang/slang-compiler.cpp

+5-1
Original file line numberDiff line numberDiff line change
@@ -1357,11 +1357,15 @@ SlangResult dissassembleDXILUsingDXC(
13571357
}
13581358
}
13591359

1360-
CPPCompiler::CompileOptions options;
1360+
typedef CPPCompiler::CompileOptions CompileOptions;
1361+
CompileOptions options;
13611362

13621363
// Set the source type
13631364
options.sourceType = (rawSourceLanguage == SourceLanguage::C) ? CPPCompiler::SourceType::C : CPPCompiler::SourceType::CPP;
13641365

1366+
// Disable exceptions and security checks
1367+
options.flags &= ~(CompileOptions::Flag::EnableExceptionHandling | CompileOptions::Flag::EnableSecurityChecks);
1368+
13651369
// Generate a path a temporary filename for output module
13661370
String modulePath;
13671371
SLANG_RETURN_ON_FAIL(File::generateTemporary(UnownedStringSlice::fromLiteral("slang-generated"), modulePath));

source/slang/slang-emit-cpp.cpp

+167-54
Original file line numberDiff line numberDiff line change
@@ -2463,6 +2463,103 @@ struct GlobalParamInfo
24632463
UInt size;
24642464
};
24652465

2466+
void CPPSourceEmitter::_emitEntryPointDefinitionStart(IRFunc* func, IRGlobalParam* entryPointGlobalParams, const String& funcName)
2467+
{
2468+
auto resultType = func->getResultType();
2469+
2470+
auto entryPointLayout = asEntryPoint(func);
2471+
2472+
// Emit the actual function
2473+
emitEntryPointAttributes(func, entryPointLayout);
2474+
emitType(resultType, funcName);
2475+
2476+
m_writer->emit("(ComputeVaryingInput* varyingInput, UniformEntryPointParams* params, UniformState* uniformState)\n{\n");
2477+
emitSemantics(func);
2478+
2479+
m_writer->indent();
2480+
// Initialize when constructing so that globals are zeroed
2481+
m_writer->emit("Context context = {};\n");
2482+
m_writer->emit("context.uniformState = uniformState;\n");
2483+
m_writer->emit("context.varyingInput = *varyingInput;\n");
2484+
2485+
if (entryPointGlobalParams)
2486+
{
2487+
auto varDecl = entryPointGlobalParams;
2488+
auto rawType = varDecl->getDataType();
2489+
2490+
auto varType = rawType;
2491+
2492+
m_writer->emit("context.");
2493+
m_writer->emit(getName(varDecl));
2494+
m_writer->emit(" = (");
2495+
emitType(varType);
2496+
m_writer->emit("*)params; \n");
2497+
}
2498+
}
2499+
2500+
void CPPSourceEmitter::_emitEntryPointDefinitionEnd(IRFunc* func)
2501+
{
2502+
SLANG_UNUSED(func);
2503+
m_writer->dedent();
2504+
m_writer->emit("}\n");
2505+
}
2506+
2507+
// We want to order such that the largest range is the inner loop
2508+
2509+
void CPPSourceEmitter::_emitEntryPointGroup(const UInt sizeAlongAxis[3], const String& funcName)
2510+
{
2511+
struct AxisWithSize
2512+
{
2513+
typedef AxisWithSize ThisType;
2514+
bool operator<(const ThisType& rhs) const { return size < rhs.size; }
2515+
2516+
int axis;
2517+
UInt size;
2518+
};
2519+
List<AxisWithSize> axes;
2520+
2521+
for (int i = 0; i < 3; ++i)
2522+
{
2523+
if (sizeAlongAxis[i] > 1)
2524+
{
2525+
AxisWithSize axisWithSize;
2526+
axisWithSize.axis = i;
2527+
axisWithSize.size = sizeAlongAxis[i];
2528+
axes.add(axisWithSize);
2529+
}
2530+
}
2531+
2532+
axes.sort();
2533+
2534+
// Open all the loops
2535+
StringBuilder builder;
2536+
for (Index i = 0; i < axes.getCount(); ++i)
2537+
{
2538+
const auto& axis = axes[i];
2539+
builder.Clear();
2540+
const char elem[2] = { s_elemNames[axis.axis], 0 };
2541+
builder << "for (uint32_t " << elem << " = start." << elem << "; " << elem << " < start." << elem << " + " << axis.size << "; ++" << elem << ")\n{\n";
2542+
m_writer->emit(builder);
2543+
m_writer->indent();
2544+
2545+
builder.Clear();
2546+
builder << "context.dispatchThreadID." << elem << " = " << elem << ";\n";
2547+
m_writer->emit(builder);
2548+
}
2549+
2550+
// just call at inner loop point
2551+
m_writer->emit("context._");
2552+
m_writer->emit(funcName);
2553+
m_writer->emit("();\n");
2554+
2555+
// Close all the loops
2556+
for (Index i = Index(axes.getCount() - 1); i >= 0; --i)
2557+
{
2558+
m_writer->dedent();
2559+
m_writer->emit("}\n");
2560+
}
2561+
}
2562+
24662563
void CPPSourceEmitter::emitModuleImpl(IRModule* module)
24672564
{
24682565
List<EmitAction> actions;
@@ -2600,77 +2697,93 @@ void CPPSourceEmitter::emitModuleImpl(IRModule* module)
26002697
auto entryPointLayout = asEntryPoint(func);
26012698
if (entryPointLayout)
26022699
{
2603-
auto resultType = func->getResultType();
2604-
auto name = getFuncName(func);
2700+
// https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/sv-dispatchthreadid
2701+
// SV_DispatchThreadID is the sum of SV_GroupID * numthreads and GroupThreadID.
26052702

2606-
// Emit the actual function
2607-
emitEntryPointAttributes(func, entryPointLayout);
2608-
emitType(resultType, name);
2703+
static const UInt kAxisCount = 3;
2704+
UInt sizeAlongAxis[kAxisCount];
26092705

2610-
m_writer->emit("(ComputeVaryingInput* varyingInput, UniformEntryPointParams* params, UniformState* uniformState)\n{\n");
2611-
emitSemantics(func);
2706+
String funcName = getFuncName(func);
26122707

2613-
m_writer->indent();
2614-
// Initialize when constructing so that globals are zeroed
2615-
m_writer->emit("Context context = {};\n");
2616-
m_writer->emit("context.uniformState = uniformState;\n");
2617-
m_writer->emit("context.varyingInput = *varyingInput;\n");
2708+
{
2709+
_emitEntryPointDefinitionStart(func, entryPointGlobalParams, funcName);
26182710

2619-
if (entryPointGlobalParams)
2620-
{
2621-
auto varDecl = entryPointGlobalParams;
2622-
auto rawType = varDecl->getDataType();
2711+
// Emit dispatchThreadID
2712+
if (entryPointLayout->profile.GetStage() == Stage::Compute)
2713+
{
2714+
// TODO: this is kind of gross because we are using a public
2715+
// reflection API function, rather than some kind of internal
2716+
// utility it forwards to...
2717+
spReflectionEntryPoint_getComputeThreadGroupSize((SlangReflectionEntryPoint*)entryPointLayout, kAxisCount, &sizeAlongAxis[0]);
26232718

2624-
auto varType = rawType;
2719+
m_writer->emit("context.dispatchThreadID = {\n");
2720+
m_writer->indent();
26252721

2626-
m_writer->emit("context.");
2627-
m_writer->emit(getName(varDecl));
2628-
m_writer->emit(" = (");
2629-
emitType(varType);
2630-
m_writer->emit("*)params; \n");
2631-
}
2632-
2633-
// Emit dispatchThreadID
2634-
if (entryPointLayout->profile.GetStage() == Stage::Compute)
2635-
{
2636-
// https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/sv-dispatchthreadid
2637-
// SV_DispatchThreadID is the sum of SV_GroupID * numthreads and GroupThreadID.
2722+
StringBuilder builder;
2723+
for (int i = 0; i < kAxisCount; ++i)
2724+
{
2725+
builder.Clear();
2726+
const char elem[2] = {s_elemNames[i], 0};
2727+
builder << "varyingInput->groupID." << elem << " * " << sizeAlongAxis[i] << " + varyingInput->groupThreadID." << elem;
2728+
if (i < kAxisCount - 1)
2729+
{
2730+
builder << ",";
2731+
}
2732+
builder << "\n";
2733+
m_writer->emit(builder);
2734+
}
26382735

2639-
static const UInt kAxisCount = 3;
2640-
UInt sizeAlongAxis[kAxisCount];
2736+
m_writer->dedent();
2737+
m_writer->emit("};\n");
2738+
}
26412739

2642-
// TODO: this is kind of gross because we are using a public
2643-
// reflection API function, rather than some kind of internal
2644-
// utility it forwards to...
2645-
spReflectionEntryPoint_getComputeThreadGroupSize((SlangReflectionEntryPoint*)entryPointLayout, kAxisCount, &sizeAlongAxis[0]);
2740+
m_writer->emit("context._");
2741+
m_writer->emit(funcName);
2742+
m_writer->emit("();\n");
26462743

2647-
m_writer->emit("context.dispatchThreadID = {\n");
2648-
m_writer->indent();
2744+
_emitEntryPointDefinitionEnd(func);
2745+
}
26492746

2747+
// Emit the group version which runs for all elements in a thread group
2748+
{
26502749
StringBuilder builder;
2651-
2652-
for (int i = 0; i < kAxisCount; ++i)
2750+
builder << getFuncName(func);
2751+
builder << "_Group";
2752+
2753+
String groupFuncName = builder;
2754+
2755+
_emitEntryPointDefinitionStart(func, entryPointGlobalParams, groupFuncName);
2756+
2757+
// Emit dispatchThreadID
2758+
if (entryPointLayout->profile.GetStage() == Stage::Compute)
26532759
{
2654-
builder.Clear();
2655-
const char elem[2] = {s_elemNames[i], 0};
2656-
builder << "varyingInput->groupID." << elem << " * " << sizeAlongAxis[i] << " + varyingInput->groupThreadID." << elem;
2657-
if (i < kAxisCount - 1)
2760+
spReflectionEntryPoint_getComputeThreadGroupSize((SlangReflectionEntryPoint*)entryPointLayout, kAxisCount, &sizeAlongAxis[0]);
2761+
26582762
{
2659-
builder << ",";
2763+
m_writer->emit("const uint3 start = {\n");
2764+
m_writer->indent();
2765+
for (int i = 0; i < kAxisCount; ++i)
2766+
{
2767+
builder.Clear();
2768+
const char elem[2] = { s_elemNames[i], 0 };
2769+
builder << "varyingInput->groupID." << elem << " * " << sizeAlongAxis[i];
2770+
if (i < kAxisCount - 1)
2771+
{
2772+
builder << ",";
2773+
}
2774+
builder << "\n";
2775+
m_writer->emit(builder);
2776+
}
2777+
m_writer->dedent();
2778+
m_writer->emit("};\n");
26602779
}
2661-
builder << "\n";
2662-
m_writer->emit(builder);
2780+
m_writer->emit("context.dispatchThreadID = start;\n");
2781+
2782+
_emitEntryPointGroup(sizeAlongAxis, funcName);
26632783
}
26642784

2665-
m_writer->dedent();
2666-
m_writer->emit("};\n");
2785+
_emitEntryPointDefinitionEnd(func);
26672786
}
2668-
2669-
m_writer->emit("context._");
2670-
m_writer->emit(name);
2671-
m_writer->emit("();\n");
2672-
m_writer->dedent();
2673-
m_writer->emit("}\n");
26742787
}
26752788
}
26762789
}

source/slang/slang-emit-cpp.h

+4
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,10 @@ class CPPSourceEmitter: public CLikeSourceEmitter
257257

258258
SlangResult _calcTextureTypeName(IRTextureTypeBase* texType, StringBuilder& outName);
259259

260+
void _emitEntryPointDefinitionStart(IRFunc* func, IRGlobalParam* entryPointGlobalParams, const String& funcName);
261+
void _emitEntryPointDefinitionEnd(IRFunc* func);
262+
void _emitEntryPointGroup(const UInt sizeAlongAxis[3], const String& funcName);
263+
260264
Dictionary<SpecializedIntrinsic, StringSlicePool::Handle> m_intrinsicNameMap;
261265
Dictionary<IRType*, StringSlicePool::Handle> m_typeNameMap;
262266

tests/compute/array-param.slang

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
//TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute
1+
//TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute -compile-arg -O3
22
//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute
33
//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12
44
//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute

0 commit comments

Comments
 (0)