Skip to content

Commit 3a02c59

Browse files
author
Tim Foley
authored
Specialize away resource-type function parameters (shader-slang#759)
* Specialize away resource-type function parameters Work on shader-slang#397. Introduction ------------ Suppose a user writes a function that takes a resource type as a parameter: ```hlsl float4 getThing(RWStructuredBuffer<float4> buffer, int index) { return buffer[index]; } ``` This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`: ```hlsl RWStructuredBuffer<float4> gBuffer; ``` translates to a global GLSL `buffer` declaration: ```hlsl buffer _S0 { float4 _data[]; } gBuffer; ``` There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function. (Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy) The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`. The solution implemented in this change works around some, but not all, of the problems with full inlining. The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites. Thus if we have code like: ```hlsl RWStructuredBuffer<float4> gA; RWStructuredBuffer<float4> gB[10]; ... getThing(gA, x); getThing(gA, y); getThing(gB[someVal], z); ``` we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`: ```hlsl float4 getThing_gA(int index) { return gA[index]; } float4 getThing_gB(int _val, int index) { return gB[_val][index]; } ``` and the call sites will change to match: ```hlsl getThing_gA(x); getThing_gA(y); getThing_gB(someVal, z); ``` Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it. While this description motivates the change based on GLSL output, the same basic issue can arise for other targets. For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters. Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types. Implementation -------------- The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand. The code in this pass is heavily commented, so hopefully it serves to explain itself all right. After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output. To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`. This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live. When I first added the DCE pass it eliminated *everything* because there were no "roots" for liveness. I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on). A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate. I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive *as if* it used it, without actually using it. There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced. Testing ------- There are three different tests added as part of this change: * The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization. * The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic. * The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets. * fixup: warnings/errors from other compilers * fixup: typos and cleanup * fixup: typos
1 parent d2ddc59 commit 3a02c59

22 files changed

+1920
-34
lines changed

source/slang/emit.cpp

+86-19
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,11 @@
22
#include "emit.h"
33

44
#include "../core/slang-writer.h"
5+
#include "ir-dce.h"
56
#include "ir-insts.h"
67
#include "ir-restructure.h"
78
#include "ir-restructure-scoping.h"
9+
#include "ir-specialize-resources.h"
810
#include "ir-ssa.h"
911
#include "ir-validate.h"
1012
#include "legalize-types.h"
@@ -5603,21 +5605,56 @@ struct EmitVisitor
56035605
emit("}\n");
56045606
}
56055607

5608+
/// Emit the array brackets that go on the end of a declaration of the given type.
56065609
void emitArrayBrackets(
56075610
EmitContext* ctx,
5608-
IRType* type)
5611+
IRType* inType)
56095612
{
56105613
SLANG_UNUSED(ctx);
56115614

5612-
if(auto arrayType = as<IRArrayType>(type))
5613-
{
5614-
emit("[");
5615-
EmitVal(arrayType->getElementCount(), kEOp_General);
5616-
emit("]");
5617-
}
5618-
else if(auto unsizedArrayType = as<IRUnsizedArrayType>(type))
5615+
// A declaration may require zero, one, or
5616+
// more array brackets. When writing out array
5617+
// brackets from left to right, they represent
5618+
// the structure of the type from the "outside"
5619+
// in (that is, if we have a 5-element array of
5620+
// 3-element arrays we should output `[5][3]`),
5621+
// because of C-style declarator rules.
5622+
//
5623+
// This conveniently means that we can print
5624+
// out all the array brackets with a looping
5625+
// rather than a recursive structure.
5626+
//
5627+
// We will peel the input type like an onion,
5628+
// looking at one layer at a time until we
5629+
// reach a non-array type in the middle.
5630+
//
5631+
IRType* type = inType;
5632+
for(;;)
56195633
{
5620-
emit("[]");
5634+
if(auto arrayType = as<IRArrayType>(type))
5635+
{
5636+
emit("[");
5637+
EmitVal(arrayType->getElementCount(), kEOp_General);
5638+
emit("]");
5639+
5640+
// Continue looping on the next layer in.
5641+
//
5642+
type = arrayType->getElementType();
5643+
}
5644+
else if(auto unsizedArrayType = as<IRUnsizedArrayType>(type))
5645+
{
5646+
emit("[]");
5647+
5648+
// Continue looping on the next layer in.
5649+
//
5650+
type = unsizedArrayType->getElementType();
5651+
}
5652+
else
5653+
{
5654+
// This layer wasn't an array, so we are done.
5655+
//
5656+
return;
5657+
}
56215658
}
56225659
}
56235660

@@ -5752,16 +5789,6 @@ struct EmitVisitor
57525789
emit(";\n");
57535790
}
57545791

5755-
IRType* unwrapArray(IRType* type)
5756-
{
5757-
IRType* t = type;
5758-
while( auto arrayType = as<IRArrayTypeBase>(t) )
5759-
{
5760-
t = arrayType->getElementType();
5761-
}
5762-
return t;
5763-
}
5764-
57655792
void emitIRStructuredBuffer_GLSL(
57665793
EmitContext* ctx,
57675794
IRGlobalParam* varDecl,
@@ -6546,6 +6573,46 @@ String emitEntryPoint(
65466573
#endif
65476574
validateIRModuleIfEnabled(compileRequest, irModule);
65486575

6576+
// After type legalization and subsequent SSA cleanup we expect
6577+
// that any resource types passed to functions are exposed
6578+
// as their own top-level parameters (which might have
6579+
// resource or array-of-...-resource types).
6580+
//
6581+
// Many of our targets place restrictions on how certain
6582+
// resource types can be used, so that having them as
6583+
// function parameters is invalid. To clean this up,
6584+
// we will try to specialize called functions based
6585+
// on the actual resources that are being passed to them
6586+
// at specific call sites.
6587+
//
6588+
// Because the legalization may depend on what target
6589+
// we are compiling for (certain things might be okay
6590+
// for D3D targets that are not okay for Vulkan), we
6591+
// pass down the target request along with the IR.
6592+
//
6593+
specializeResourceParameters(compileRequest, targetRequest, irModule);
6594+
6595+
#if 0
6596+
dumpIRIfEnabled(compileRequest, irModule, "AFTER RESOURCE SPECIALIZATION");
6597+
#endif
6598+
validateIRModuleIfEnabled(compileRequest, irModule);
6599+
6600+
// The resource-based specialization pass above
6601+
// may create specialized versions of functions, but
6602+
// it does not try to completely eliminate the original
6603+
// functions, so there might still be invalid code in
6604+
// our IR module.
6605+
//
6606+
// To clean up the code, we will apply a fairly general
6607+
// dead-code-elimination (DCE) pass that only retains
6608+
// whatever code is "live."
6609+
//
6610+
eliminateDeadCode(compileRequest, irModule);
6611+
#if 0
6612+
dumpIRIfEnabled(compileRequest, irModule, "AFTER DCE");
6613+
#endif
6614+
validateIRModuleIfEnabled(compileRequest, irModule);
6615+
65496616
// After all of the required optimization and legalization
65506617
// passes have been performed, we can emit target code from
65516618
// the IR module.

0 commit comments

Comments
 (0)