Invalid intermediate CUDA code is generated with new compilation API #6507

aleino-nv · 2025-03-03T09:25:07Z

Here is a WIP branch that modifies one unit test to try to compile the shader from tests/compute/simple.slang with the new compilation API, targeting PTX.

Here is the invalid intermediate CUDA code that gets generated:

#line 2 "m.slang"
struct GlobalParams_0
{
    RWStructuredBuffer<float> outputBuffer_0;
};


#line 2
extern "C" __constant__ GlobalParams_0 SLANG_globalParams;
#define globalParams_0 (&SLANG_globalParams)

extern "C" __global__ void computeMain()
{

#line 5
    uint3  _S1 = blockIdx * blockDim + threadIdx;

    uint _S2 = (&_S1).x;

#line 7
    *(&(globalParams_0->outputBuffer_0[_S2])) = float(_S2);
    return;
}

The line uint _S2 = (&_S1).x; is invalid and does not compile with nvrtc:

diagnostic: nvrtc 12.8: nvrtc: warning : Architectures prior to '<compute/sm>_75' are deprecated and may be removed in a future release
nvrtc 12.8: m.slang(7): error : expression must have class type but it has type "uint3 *"
nvrtc 12.8: note :       uint _S2 = (&_S1).x;
nvrtc 12.8: note :

The text was updated successfully, but these errors were encountered:

aleino-nv · 2025-03-03T09:26:21Z

For reference, the following is generated with -g2 via the old compilation API:

#include "C:/src/slang/prelude/slang-cuda-prelude.h"

struct GlobalParams_0
{
    RWStructuredBuffer<float> outputBuffer_0;
};

extern "C" __constant__ GlobalParams_0 SLANG_globalParams;
#define globalParams_0 (&SLANG_globalParams)
extern "C" __global__ void computeMain()
{
    uint3  _S1 = blockIdx * blockDim + threadIdx;
    uint _S2 = _S1.x;
    *(&(globalParams_0->outputBuffer_0[_S2])) = float(_S2);
    return;
}

aleino-nv · 2025-03-03T11:13:56Z

Relevant IR snippet when compiling with the deprecated compile request API (via slangc):

{623}	func %computeMain	: Func(Void, ConstRef(Vec(UInt, 3 : Int)))
{
{679}	block %35:
	{846}	let  %36	: Vec(UInt, 3 : Int)	= mul(%37, %38)
	{847}	let  %39	: Vec(UInt, 3 : Int)	= add(%36, %40)
	{684}	DebugLine(%34, 22 : UInt, 22 : UInt, 5 : UInt, 6 : UInt)
	{685}	DebugLine(%34, 22 : UInt, 22 : UInt, 5 : UInt, 6 : UInt)
	{686}	let  %41	: UInt	= swizzle(%39, 0 : Int)
	{771}	let  %42	: Ptr(RWStructuredBuffer(Float, DefaultLayout, %19))	= get_field_addr(%globalParams, %outputBuffer)
	{862}	let  %43	: RWStructuredBuffer(Float, DefaultLayout, %19)	= load(%42)
	{725}	let  %44	: Ptr(Float)	= rwstructuredBufferGetElementPtr(%43, %41)
	{727}	let  %45	: Float	= castIntToFloat(%41)
	{728}	store(%44, %45)
	{731}	return_val(void_constant)
}

Relevant IR snippet when compiling the hacked unit test with the new API:

{359}	func %computeMain	: Func(Void, Vec(UInt, 3 : Int))
{
{402}	block %34:
	{565}	let  %35	: Vec(UInt, 3 : Int)	= mul(%36, %37)
	{566}	let  %38	: Vec(UInt, 3 : Int)	= add(%35, %39)
	{579}	let  %40	: Ptr(Vec(UInt, 3 : Int))	= var
	{580}	store(%40, %38)
	{404}	let  %41	: UInt	= swizzle(%40, 0 : Int)
	{489}	let  %42	: Ptr(RWStructuredBuffer(Float, DefaultLayout, %19))	= get_field_addr(%globalParams, %outputBuffer)
	{581}	let  %43	: RWStructuredBuffer(Float, DefaultLayout, %19)	= load(%42)
	{440}	let  %44	: Ptr(Float)	= rwstructuredBufferGetElementPtr(%43, %41)
	{442}	let  %45	: Float	= castIntToFloat(%41)
	{443}	store(%44, %45)
	{446}	return_val(void_constant)
}

Doing a swizzle on a ptr type directly leads to generating the problematic code.
I wonder if for some reason the deprecated API is causing some optimization to run, which hides the codegen bug.

Of course we still need to fix the bug, but it would be good to find out the answer to this question before continuing with the original task #4760.

aleino-nv · 2025-03-03T12:33:11Z

I found that legalizeEntryPointVaryingParamsForCUDA introduces the swizzle of the ptr in the IR, but I haven't pinpointed exactly where yet.

aleino-nv · 2025-03-03T13:56:16Z

I found that legalizeEntryPointVaryingParamsForCUDA introduces the swizzle of the ptr in the IR, but I haven't pinpointed exactly where yet.

It's from here, but I believe the more promising lead right now is the fact that old API produces an entry point with the signature
func %computeMain : Func(Void, ConstRef(Vec(UInt, 3 : Int)))
whereas the new API produces the signature
func %computeMain : Func(Void, Vec(UInt, 3 : Int)).

(The ptr being swizzled is a temporary variable containing the function parameter, which is actually uint3 dispatchThreadID : SV_DispatchThreadID.)

aleino-nv · 2025-03-04T08:01:45Z

The difference in the entry point parameter being constref vs not in old vs new API results comes from slang-lower-to-ir.cpp/collectParameterLists via bool lowerVaryingInputAsConstRef = declRef.getDecl()->hasModifier<EntryPointAttribute>(); getting values true/false.

In other words, for some reason the new API entrypoint does not have the EntryPointAttribute decoration, whereas the old one has.
The shader doesn't have the explicit [shader("compute")] attribute.

There is a comment explaining that constref is needed, so I assume the fix is to make sure that the new API will also end up adding the attribute to the entry point.

EDIT: On second thought, maybe the AST should more reflect what the user wrote. I can just look for other attributes that indicate the function is an entrypoint, instead. For example numthreads in this case.

…wring For shaders like tests/compute/simple.slang, which have a 'numthreads' attribute but no 'shader' attribute, the old compile request API would add an EntryPointAttribute to the AST node of the entry point. However, the new API doesn't, and so a certain ConstRef hack doesn't get applied when using the new API, leading to subsequent code generation issues. This patch also checks for a 'numthreads' attribute when deciding whether to apply the ConstRef hack. This closes issue shader-slang#6507 and helps to resolve issue shader-slang#4760.

* Add cuda codegen bug repro This just compiles tests/compute/simlpe.slang for PTX with the new compilation API, in order to reproduce a code generation bug. * Detect entrypoint more robustly when applying ConstRef hack during lowring For shaders like tests/compute/simple.slang, which have a 'numthreads' attribute but no 'shader' attribute, the old compile request API would add an EntryPointAttribute to the AST node of the entry point. However, the new API doesn't, and so a certain ConstRef hack doesn't get applied when using the new API, leading to subsequent code generation issues. This patch also checks for a 'numthreads' attribute when deciding whether to apply the ConstRef hack. This closes issue #6507 and helps to resolve issue #4760. * Add expected failure list for GitHub runners Our GitHub runners don't have the CUDA toolkits installed, so they can't run all tests.

aleino-nv · 2025-03-06T12:36:38Z

Fixed by #6506

aleino-nv mentioned this issue Mar 3, 2025

Move slang-test and render-test to use the new compilation API #4760

Closed

aleino-nv self-assigned this Mar 3, 2025

aleino-nv added goal:quality & productivity Quality issues and issues that impact our productivity coding day to day inside slang kind:bug something doesn't work like it should labels Mar 4, 2025

aleino-nv mentioned this issue Mar 4, 2025

Fix codegen bug when targeting PTX with new API #6506

Merged

aleino-nv closed this as completed Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid intermediate CUDA code is generated with new compilation API #6507

Invalid intermediate CUDA code is generated with new compilation API #6507

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 4, 2025 •

edited

Loading

aleino-nv commented Mar 6, 2025

Invalid intermediate CUDA code is generated with new compilation API #6507

Invalid intermediate CUDA code is generated with new compilation API #6507

Comments

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 3, 2025

aleino-nv commented Mar 4, 2025 • edited Loading

aleino-nv commented Mar 6, 2025

aleino-nv commented Mar 4, 2025 •

edited

Loading