Skip to content

Commit 723c9b1

Browse files
author
Tim Foley
authored
Remove KernelContext wrapper from CPU/CUDA emit (shader-slang#1440)
* Remove KernelContext wrapper from CPU/CUDA emit Currently, the CPU and CUDA C++ targets rely on a `KernelContext` type that is generated during emit, as a way to provide implicit access to things that were global in the input Slang code, but that can't actually be emitted as globals in the target language (because the semantics of global declarations differ). For example, input like: ```hlsl ConstantBuffer<Stuff> gStuff; // shader parameter groupshared int gData[1024]; // thread-group shared variable static int gCounter = 0; // "thread-local" global-scope variable void subroutine() { ... } [shader("compute")] void computeMain() { ... } ``` would translate to output C++ for CPU a bit like: ```c++ struct KernelContext { ConstantBuffer<Stuff> gStuff; int gData[1024]; int gCounter = 0; void subroutine() { ... } void computeMain() { ... } }; ``` Note that both `computeMain()` and `subroutine()` are non-`static` members functions on `KernelContext`, so they have an implicit `this` parameter of type `KernelContext`, which allows the bodies of those functions to implicitly reference `gStuff`, etc. by name in their bodies. Because `KernelContext::computeMain()` is a member function, we end up emitting an additional global-scope function to expose the entry point to the outside world, and that function is responsible for declaring a local `KernelContext` and invoking the generated entry point on it. This approach has several important drawbacks: * It complicates the emit logic for CPU and CUDA, with many special cases around when/how things get emitted * It complicates the implementation of dynamic dispatch, because what seems like a function pointer in Slang IR needs to be a pointer-to-member-function in C++. * It makes it difficult to have a non-kernel-oriented mode of compilation for CPU where a Slang function with a given signature gets output as a C++ CPU function with the "same" signature (not wrapped up as a member function of `KernelContext`. This change makes a step toward addressing these issues by making the introducing of the `KernelContext` type be something that is done in an explicit IR pass instead of being handled as part of the last-mile emit logic. The most important change is the removal of code related to `KernelContext` from the `slang-emit-{cpp,cuda}.{h,cpp}` files, with the equivalent logic instead being handled in a new pass in `slang-ir-explicit-global-context.{h,cpp}`. It should be noted that further cleanups to the emit logic should now be possible; in particular, both the CPU and CUDA emit paths are manually sequencing the `EmitAction`s instead of relying on the default logic, but at this point they should be able to just use the default. The additional cleanups are left for future work. The explicit IR pass does more or less what one would expect: it identifies global-scope entities (global variables and parameters) that need to be wrapped and turns them into fields of a `KernelContext` type. It then modifies all entry points to initialize a `KernelContext` as part of their startup. Finally, any code that used to refer to the global entities is changed to refer to a field of the context, with the context passed via new function parameters (the new parameter is only added to functions that need it for now). Transforming global variables into fields of a `KernelContext` type in the IR pass ends up dropping their initial-value expressions (since those were attached as basic blocks on the `IRGlobalVar`). To avoid breaking code that relies on global-scope (but thread-local) variables, this change also adds an explicit pass that takes the initialization logic on all global variables and moves it to explicit logic that runs at the start of every entry point in a linked module (`slang-ir-explicit-global-init.{h,cpp}`). This pass would also be useful when we get back to direct SPIR-V emit, since SPIR-V also requires initialization logic for globals to be emitted into entry points. One complication that arises when the IR is introducing the types for entry-point parameters, global-scope parameters, and the `KernelContext` type is that it becomes harder for the emit logic to utter the names of those types (they might not even have names, since `IRNameHint`s might get stripped). This created a problem since the wrapper operations that were being generated for CPU were taking `void*` parameters and casting them to the appropriate type. To work around this issue, we have added an explicit IR pass (`slang-ir-entry-point-raw-ptr-params.{h,cpp}`) that transforms the signature of entry points so that any pointer parameters instead become raw pointer (`void*`) parameters, with the casting being handled inside the entry point itself. One consequence of all the above changes is that for the CUDA target we no longer need a wrapper function to invoke the generated entry point any more, because the IR function for the entry point ends up having the correct/expected signature already. This is also the case for CPU when it comes to the `*_Thread` wrapper function, but this change doesn't try to eliminate the wrapper because of a belief that the `*_Thread`-level interface is going away anyway. Because the IR is now responsible for ensuring the signature of the IR entry point for CUDA and CPU is what is expected, I needed to modify the `slang-ir-entry-point-uniforms` pass to always create an explicit parameter for the entry point uniforms when compiling for CUDA/CPU, even if there were no `uniform` parameters on the entry point as written. This also ended up requiring some tweaks to the parameter layout logic to ensure that CPU/CUDA targets always treat `ConstantBuffer<T>` as a `T*` even in the case where `T` is an empty `struct` type (which happens when we construct a `struct` type to represent the uniform parameters of an entry point with no uniform parameters...). There are several future changes that can/should build on this work: * We should change the generated signatures for CUDA kernels, so that they don't rely on `KernelContext` for global-scope parameters. At that point we can avoid generating a `KernelContext` at all for CUDA, except when a program uses global-scope thread-local variables. * We should figure out how to make the "ABI" for dynamic-dispatch calls ensure that the kernel context is either always passed, or always *not* passed. Making a hard-and-fast rule as part of the calling convention for dynamic calls would ensure that they access through the context continues to work with dynamic calls (this change might break it in some cases). * We should figure out how to handle the layout for the `KernelContext` in cases where a program is composed of multiple separately-compiled modules. Right now the layout of the `KernelContext` requires global knowledge (as does the pass that introduces explicit initialization for global-scope thread-locals). * We should try to further clean up the CPU/CUDA C++ emit logic to fall back on the default emit behavior more, now that the various special-case approaches that were taken are no longer needed * fixup: restore build files to default configuration
1 parent 48f26ef commit 723c9b1

19 files changed

+1329
-537
lines changed

source/slang/slang-emit-c-like.cpp

+12-8
Original file line numberDiff line numberDiff line change
@@ -2341,16 +2341,20 @@ void CLikeSourceEmitter::defaultEmitInstExpr(IRInst* inst, const EmitOpInfo& inO
23412341

23422342
case kIROp_BitCast:
23432343
{
2344-
// TODO: we can simplify the logic for arbitrary bitcasts
2345-
// by always bitcasting the source to a `uint*` type (if it
2346-
// isn't already) and then bitcasting that to the destination
2347-
// type (if it isn't already `uint*`.
2344+
// Note: we are currently emitting casts as plain old
2345+
// C-style casts, which may not always perform a bitcast.
23482346
//
2349-
// For now we are assuming the source type is *already*
2350-
// a `uint*` type of the appropriate size.
2351-
//
2352-
// auto fromType = extractBaseType(inst->getOperand(0)->getDataType());
2347+
// TODO: This operation should map to an intrinsic to be
2348+
// provided in a prelude for C/C++, so that the target
2349+
// can easily emit code for whatever the best possible
2350+
// bitcast is on the platform.
23532351

2352+
auto prec = getInfo(EmitOp::Prefix);
2353+
needClose = maybeEmitParens(outerPrec, prec);
2354+
2355+
m_writer->emit("(");
2356+
emitType(inst->getDataType());
2357+
m_writer->emit(")");
23542358
m_writer->emit("(");
23552359
emitOperand(inst->getOperand(0), getInfo(EmitOp::General));
23562360
m_writer->emit(")");

source/slang/slang-emit-cpp.cpp

+10-158
Original file line numberDiff line numberDiff line change
@@ -1767,7 +1767,6 @@ void CPPSourceEmitter::_emitWitnessTableDefinitions()
17671767
else
17681768
isFirstEntry = false;
17691769

1770-
m_writer->emit("&KernelContext::");
17711770
m_writer->emit(_getWitnessTableWrapperFuncName(funcVal));
17721771
}
17731772
else if (auto witnessTableVal = as<IRWitnessTable>(entry->getSatisfyingVal()))
@@ -1830,7 +1829,7 @@ void CPPSourceEmitter::_maybeEmitWitnessTableTypeDefinition(
18301829
if (auto funcVal = as<IRFuncType>(entry->getRequirementVal()))
18311830
{
18321831
emitType(funcVal->getResultType());
1833-
m_writer->emit(" (KernelContext::*");
1832+
m_writer->emit(" (*");
18341833
m_writer->emit(getName(entry->getRequirementKey()));
18351834
m_writer->emit(")");
18361835
m_writer->emit("(");
@@ -1964,8 +1963,7 @@ void CPPSourceEmitter::emitSimpleFuncImpl(IRFunc* func)
19641963
// on CPU/CUDA, and these all bottleneck through the actual `IRFunc`
19651964
// here as a workhorse.
19661965
//
1967-
// Because the workhorse function is currently emitted as a member of
1968-
// `KernelContext`, and doesn't have the right signature to service
1966+
// Because the workhorse function doesn't have the right signature to service
19691967
// general-purpose calls, it is being emitted with a `_` prefix.
19701968
//
19711969
StringBuilder prefixName;
@@ -2288,15 +2286,6 @@ bool CPPSourceEmitter::tryEmitInstExprImpl(IRInst* inst, const EmitOpInfo& inOut
22882286
// Does this function declare any requirements.
22892287
handleCallExprDecorationsImpl(funcValue);
22902288

2291-
if (funcValue->op == kIROp_lookup_interface_method)
2292-
{
2293-
m_writer->emit("(this->*(");
2294-
emitOperand(funcValue, EmitOpInfo());
2295-
m_writer->emit("))");
2296-
_emitCallArgList(as<IRCall>(inst));
2297-
return true;
2298-
}
2299-
23002289
// try doing automatically
23012290
return _tryEmitInstExprAsIntrinsic(inst, inOuterPrec);
23022291
}
@@ -2389,8 +2378,6 @@ void CPPSourceEmitter::emitPreprocessorDirectivesImpl()
23892378
m_writer->emit("#ifdef SLANG_PRELUDE_NAMESPACE\n");
23902379
m_writer->emit("using namespace SLANG_PRELUDE_NAMESPACE;\n");
23912380
m_writer->emit("#endif\n\n");
2392-
2393-
m_writer->emit("struct KernelContext;\n\n");
23942381
}
23952382

23962383
if (m_target == CodeGenTarget::CSource)
@@ -2470,7 +2457,7 @@ static bool _isFunction(IROp op)
24702457
return op == kIROp_Func;
24712458
}
24722459

2473-
void CPPSourceEmitter::_emitEntryPointDefinitionStart(IRFunc* func, IRGlobalParam* entryPointParams, IRGlobalParam* globalParams, const String& funcName, const UnownedStringSlice& varyingTypeName)
2460+
void CPPSourceEmitter::_emitEntryPointDefinitionStart(IRFunc* func, const String& funcName, const UnownedStringSlice& varyingTypeName)
24742461
{
24752462
auto resultType = func->getResultType();
24762463

@@ -2488,31 +2475,6 @@ void CPPSourceEmitter::_emitEntryPointDefinitionStart(IRFunc* func, IRGlobalPara
24882475
m_writer->emit("\n{\n");
24892476

24902477
m_writer->indent();
2491-
// Initialize when constructing so that globals are zeroed
2492-
m_writer->emit("KernelContext context = {};\n");
2493-
2494-
if (entryPointParams)
2495-
{
2496-
auto param = entryPointParams;
2497-
auto paramType = param->getDataType();
2498-
2499-
m_writer->emit("context.");
2500-
m_writer->emit(getName(param));
2501-
m_writer->emit(" = (");
2502-
emitType(paramType);
2503-
m_writer->emit(")entryPointParams; \n");
2504-
}
2505-
if (globalParams)
2506-
{
2507-
auto param = globalParams;
2508-
auto paramType = param->getDataType();
2509-
2510-
m_writer->emit("context.");
2511-
m_writer->emit(getName(param));
2512-
m_writer->emit(" = (");
2513-
emitType(paramType);
2514-
m_writer->emit(")globalParams; \n");
2515-
}
25162478
}
25172479

25182480
void CPPSourceEmitter::_emitEntryPointDefinitionEnd(IRFunc* func)
@@ -2577,9 +2539,9 @@ void CPPSourceEmitter::_emitEntryPointGroup(const Int sizeAlongAxis[kThreadGroup
25772539
}
25782540

25792541
// just call at inner loop point
2580-
m_writer->emit("context._");
2542+
m_writer->emit("_");
25812543
m_writer->emit(funcName);
2582-
m_writer->emit("(&threadInput);\n");
2544+
m_writer->emit("(&threadInput, entryPointParams, globalParams);\n");
25832545

25842546
// Close all the loops
25852547
for (Index i = Index(axes.getCount() - 1); i >= 0; --i)
@@ -2675,97 +2637,6 @@ void CPPSourceEmitter::_emitForwardDeclarations(const List<EmitAction>& actions)
26752637
}
26762638
}
26772639

2678-
static bool isVaryingResourceKind(LayoutResourceKind kind)
2679-
{
2680-
switch(kind)
2681-
{
2682-
default:
2683-
return false;
2684-
2685-
case LayoutResourceKind::VaryingInput:
2686-
case LayoutResourceKind::VaryingOutput:
2687-
return true;
2688-
}
2689-
}
2690-
2691-
static bool isVaryingParameter(IRTypeLayout* typeLayout)
2692-
{
2693-
for(auto sizeAttr : typeLayout->getSizeAttrs())
2694-
{
2695-
if(!isVaryingResourceKind(sizeAttr->getResourceKind()))
2696-
return false;
2697-
}
2698-
return true;
2699-
}
2700-
2701-
static bool isVaryingParameter(IRVarLayout* varLayout)
2702-
{
2703-
return isVaryingParameter(varLayout->getTypeLayout());
2704-
}
2705-
2706-
void CPPSourceEmitter::_findShaderParams(
2707-
IRGlobalParam** outEntryPointParam,
2708-
IRGlobalParam** outGlobalParam)
2709-
{
2710-
SLANG_ASSERT(outEntryPointParam);
2711-
SLANG_ASSERT(outGlobalParam);
2712-
2713-
IRGlobalParam*& entryPointParam = *outEntryPointParam;
2714-
IRGlobalParam*& globalParam = *outGlobalParam;
2715-
2716-
for(auto inst : m_irModule->getGlobalInsts())
2717-
{
2718-
auto param = as<IRGlobalParam>(inst);
2719-
if(!param)
2720-
continue;
2721-
2722-
if(auto layoutDecor = param->findDecoration<IRLayoutDecoration>())
2723-
{
2724-
if(auto varLayout = as<IRVarLayout>(layoutDecor->getLayout()))
2725-
{
2726-
if(isVaryingParameter(varLayout))
2727-
continue;
2728-
auto typeLayout = varLayout->getTypeLayout();
2729-
if(typeLayout->findSizeAttr(LayoutResourceKind::VaryingInput))
2730-
continue;
2731-
if(typeLayout->findSizeAttr(LayoutResourceKind::VaryingOutput))
2732-
continue;
2733-
}
2734-
}
2735-
2736-
// Currently, the entry-point parameters
2737-
// are represented as a single parameter
2738-
// at the global scope, and the same is
2739-
// true of the parameters that were
2740-
// originally declared as globals.
2741-
//
2742-
// We need to find capture each of these
2743-
// parameters, and we need to tell them
2744-
// apart. Luckily, the logic that
2745-
// moved the entry-point parameters to
2746-
// global scope will ahve also marked
2747-
// the entry-point parameters with
2748-
// a decoration that we can detect.
2749-
//
2750-
if (inst->findDecorationImpl(kIROp_EntryPointParamDecoration))
2751-
{
2752-
// Should only be one instruction marked this way
2753-
SLANG_ASSERT(entryPointParam == nullptr);
2754-
entryPointParam = param;
2755-
continue;
2756-
}
2757-
else
2758-
{
2759-
// There should only be one instruction representing
2760-
// the global-scope shader parameters.
2761-
//
2762-
SLANG_ASSERT(globalParam == nullptr);
2763-
globalParam = param;
2764-
continue;
2765-
}
2766-
}
2767-
}
2768-
27692640
void CPPSourceEmitter::emitModuleImpl(IRModule* module)
27702641
{
27712642
// Setup all built in types used in the module
@@ -2778,24 +2649,8 @@ void CPPSourceEmitter::emitModuleImpl(IRModule* module)
27782649

27792650
_emitForwardDeclarations(actions);
27802651

2781-
IRGlobalParam* entryPointParams = nullptr;
2782-
IRGlobalParam* globalParams = nullptr;
2783-
_findShaderParams(&entryPointParams, &globalParams);
27842652

2785-
// Output the 'Context' which will be used for execution
27862653
{
2787-
m_writer->emit("struct KernelContext\n{\n");
2788-
m_writer->indent();
2789-
2790-
if (globalParams)
2791-
{
2792-
emitGlobalInst(globalParams);
2793-
}
2794-
if (entryPointParams)
2795-
{
2796-
emitGlobalInst(entryPointParams);
2797-
}
2798-
27992654
// Output all the thread locals
28002655
for (auto action : actions)
28012656
{
@@ -2818,9 +2673,6 @@ void CPPSourceEmitter::emitModuleImpl(IRModule* module)
28182673
// These wrapper functions takes an abstract type parameter (void*)
28192674
// in the place of `this` parameter.
28202675
_emitWitnessTableWrappers();
2821-
2822-
m_writer->dedent();
2823-
m_writer->emit("};\n\n");
28242676
}
28252677

28262678
// Emit all witness table definitions.
@@ -2856,11 +2708,11 @@ void CPPSourceEmitter::emitModuleImpl(IRModule* module)
28562708

28572709
String threadFuncName = builder;
28582710

2859-
_emitEntryPointDefinitionStart(func, entryPointParams, globalParams, threadFuncName, UnownedStringSlice::fromLiteral("ComputeThreadVaryingInput"));
2711+
_emitEntryPointDefinitionStart(func, threadFuncName, UnownedStringSlice::fromLiteral("ComputeThreadVaryingInput"));
28602712

2861-
m_writer->emit("context._");
2713+
m_writer->emit("_");
28622714
m_writer->emit(funcName);
2863-
m_writer->emit("(varyingInput);\n");
2715+
m_writer->emit("(varyingInput, entryPointParams, globalParams);\n");
28642716

28652717
_emitEntryPointDefinitionEnd(func);
28662718
}
@@ -2873,7 +2725,7 @@ void CPPSourceEmitter::emitModuleImpl(IRModule* module)
28732725

28742726
String groupFuncName = builder;
28752727

2876-
_emitEntryPointDefinitionStart(func, entryPointParams, globalParams, groupFuncName, UnownedStringSlice::fromLiteral("ComputeVaryingInput"));
2728+
_emitEntryPointDefinitionStart(func, groupFuncName, UnownedStringSlice::fromLiteral("ComputeVaryingInput"));
28772729

28782730
m_writer->emit("ComputeThreadVaryingInput threadInput = {};\n");
28792731
m_writer->emit("threadInput.groupID = varyingInput->startGroupID;\n");
@@ -2884,7 +2736,7 @@ void CPPSourceEmitter::emitModuleImpl(IRModule* module)
28842736

28852737
// Emit the main version - which takes a dispatch size
28862738
{
2887-
_emitEntryPointDefinitionStart(func, entryPointParams, globalParams, funcName, UnownedStringSlice::fromLiteral("ComputeVaryingInput"));
2739+
_emitEntryPointDefinitionStart(func, funcName, UnownedStringSlice::fromLiteral("ComputeVaryingInput"));
28882740

28892741
m_writer->emit("ComputeVaryingInput vi = *varyingInput;\n");
28902742
m_writer->emit("ComputeVaryingInput groupVaryingInput = {};\n");

source/slang/slang-emit-cpp.h

+1-6
Original file line numberDiff line numberDiff line change
@@ -84,11 +84,6 @@ class CPPSourceEmitter: public CLikeSourceEmitter
8484

8585
void _emitForwardDeclarations(const List<EmitAction>& actions);
8686

87-
/// Find the IR global parameters representing the entry-point and global shader parameters (if any)
88-
void _findShaderParams(
89-
IRGlobalParam** outEntryPointParam,
90-
IRGlobalParam** outGlobalParam);
91-
9287
void _emitAryDefinition(const HLSLIntrinsic* specOp);
9388

9489
// Really we don't want any of these defined like they are here, they should be defined in slang stdlib
@@ -119,7 +114,7 @@ class CPPSourceEmitter: public CLikeSourceEmitter
119114

120115
SlangResult _calcCPPTextureTypeName(IRTextureTypeBase* texType, StringBuilder& outName);
121116

122-
void _emitEntryPointDefinitionStart(IRFunc* func, IRGlobalParam* entryPointParams, IRGlobalParam* globalParams, const String& funcName, const UnownedStringSlice& varyingTypeName);
117+
void _emitEntryPointDefinitionStart(IRFunc* func, const String& funcName, const UnownedStringSlice& varyingTypeName);
123118
void _emitEntryPointDefinitionEnd(IRFunc* func);
124119
void _emitEntryPointGroup(const Int sizeAlongAxis[kThreadGroupAxisCount], const String& funcName);
125120
void _emitEntryPointGroupRange(const Int sizeAlongAxis[kThreadGroupAxisCount], const String& funcName);

0 commit comments

Comments
 (0)