Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid intermediate CUDA code is generated with new compilation API #6507

Closed
aleino-nv opened this issue Mar 3, 2025 · 6 comments
Closed
Assignees
Labels
goal:quality & productivity Quality issues and issues that impact our productivity coding day to day inside slang kind:bug something doesn't work like it should

Comments

@aleino-nv
Copy link
Collaborator

Here is a WIP branch that modifies one unit test to try to compile the shader from tests/compute/simple.slang with the new compilation API, targeting PTX.

Here is the invalid intermediate CUDA code that gets generated:

#line 2 "m.slang"
struct GlobalParams_0
{
    RWStructuredBuffer<float> outputBuffer_0;
};


#line 2
extern "C" __constant__ GlobalParams_0 SLANG_globalParams;
#define globalParams_0 (&SLANG_globalParams)

extern "C" __global__ void computeMain()
{

#line 5
    uint3  _S1 = blockIdx * blockDim + threadIdx;

    uint _S2 = (&_S1).x;

#line 7
    *(&(globalParams_0->outputBuffer_0[_S2])) = float(_S2);
    return;
}

The line uint _S2 = (&_S1).x; is invalid and does not compile with nvrtc:

diagnostic: nvrtc 12.8: nvrtc: warning : Architectures prior to '<compute/sm>_75' are deprecated and may be removed in a future release
nvrtc 12.8: m.slang(7): error : expression must have class type but it has type "uint3 *"
nvrtc 12.8: note :       uint _S2 = (&_S1).x;
nvrtc 12.8: note :             
@aleino-nv
Copy link
Collaborator Author

For reference, the following is generated with -g2 via the old compilation API:

#include "C:/src/slang/prelude/slang-cuda-prelude.h"

struct GlobalParams_0
{
    RWStructuredBuffer<float> outputBuffer_0;
};

extern "C" __constant__ GlobalParams_0 SLANG_globalParams;
#define globalParams_0 (&SLANG_globalParams)
extern "C" __global__ void computeMain()
{
    uint3  _S1 = blockIdx * blockDim + threadIdx;
    uint _S2 = _S1.x;
    *(&(globalParams_0->outputBuffer_0[_S2])) = float(_S2);
    return;
}

@aleino-nv
Copy link
Collaborator Author

Relevant IR snippet when compiling with the deprecated compile request API (via slangc):

{623}	func %computeMain	: Func(Void, ConstRef(Vec(UInt, 3 : Int)))
{
{679}	block %35:
	{846}	let  %36	: Vec(UInt, 3 : Int)	= mul(%37, %38)
	{847}	let  %39	: Vec(UInt, 3 : Int)	= add(%36, %40)
	{684}	DebugLine(%34, 22 : UInt, 22 : UInt, 5 : UInt, 6 : UInt)
	{685}	DebugLine(%34, 22 : UInt, 22 : UInt, 5 : UInt, 6 : UInt)
	{686}	let  %41	: UInt	= swizzle(%39, 0 : Int)
	{771}	let  %42	: Ptr(RWStructuredBuffer(Float, DefaultLayout, %19))	= get_field_addr(%globalParams, %outputBuffer)
	{862}	let  %43	: RWStructuredBuffer(Float, DefaultLayout, %19)	= load(%42)
	{725}	let  %44	: Ptr(Float)	= rwstructuredBufferGetElementPtr(%43, %41)
	{727}	let  %45	: Float	= castIntToFloat(%41)
	{728}	store(%44, %45)
	{731}	return_val(void_constant)
}

Relevant IR snippet when compiling the hacked unit test with the new API:

{359}	func %computeMain	: Func(Void, Vec(UInt, 3 : Int))
{
{402}	block %34:
	{565}	let  %35	: Vec(UInt, 3 : Int)	= mul(%36, %37)
	{566}	let  %38	: Vec(UInt, 3 : Int)	= add(%35, %39)
	{579}	let  %40	: Ptr(Vec(UInt, 3 : Int))	= var
	{580}	store(%40, %38)
	{404}	let  %41	: UInt	= swizzle(%40, 0 : Int)
	{489}	let  %42	: Ptr(RWStructuredBuffer(Float, DefaultLayout, %19))	= get_field_addr(%globalParams, %outputBuffer)
	{581}	let  %43	: RWStructuredBuffer(Float, DefaultLayout, %19)	= load(%42)
	{440}	let  %44	: Ptr(Float)	= rwstructuredBufferGetElementPtr(%43, %41)
	{442}	let  %45	: Float	= castIntToFloat(%41)
	{443}	store(%44, %45)
	{446}	return_val(void_constant)
}

Doing a swizzle on a ptr type directly leads to generating the problematic code.
I wonder if for some reason the deprecated API is causing some optimization to run, which hides the codegen bug.

Of course we still need to fix the bug, but it would be good to find out the answer to this question before continuing with the original task #4760.

@aleino-nv
Copy link
Collaborator Author

I found that legalizeEntryPointVaryingParamsForCUDA introduces the swizzle of the ptr in the IR, but I haven't pinpointed exactly where yet.

@aleino-nv
Copy link
Collaborator Author

I found that legalizeEntryPointVaryingParamsForCUDA introduces the swizzle of the ptr in the IR, but I haven't pinpointed exactly where yet.

It's from here, but I believe the more promising lead right now is the fact that old API produces an entry point with the signature
func %computeMain : Func(Void, ConstRef(Vec(UInt, 3 : Int)))
whereas the new API produces the signature
func %computeMain : Func(Void, Vec(UInt, 3 : Int)).

(The ptr being swizzled is a temporary variable containing the function parameter, which is actually uint3 dispatchThreadID : SV_DispatchThreadID.)

@aleino-nv aleino-nv added goal:quality & productivity Quality issues and issues that impact our productivity coding day to day inside slang kind:bug something doesn't work like it should labels Mar 4, 2025
@aleino-nv
Copy link
Collaborator Author

aleino-nv commented Mar 4, 2025

The difference in the entry point parameter being constref vs not in old vs new API results comes from slang-lower-to-ir.cpp/collectParameterLists via bool lowerVaryingInputAsConstRef = declRef.getDecl()->hasModifier<EntryPointAttribute>(); getting values true/false.

In other words, for some reason the new API entrypoint does not have the EntryPointAttribute decoration, whereas the old one has.
The shader doesn't have the explicit [shader("compute")] attribute.

There is a comment explaining that constref is needed, so I assume the fix is to make sure that the new API will also end up adding the attribute to the entry point.

EDIT: On second thought, maybe the AST should more reflect what the user wrote. I can just look for other attributes that indicate the function is an entrypoint, instead. For example numthreads in this case.

aleino-nv added a commit to aleino-nv/slang that referenced this issue Mar 4, 2025
…wring

For shaders like tests/compute/simple.slang, which have a 'numthreads' attribute but no
'shader' attribute, the old compile request API would add an EntryPointAttribute to the
AST node of the entry point. However, the new API doesn't, and so a certain ConstRef hack
doesn't get applied when using the new API, leading to subsequent code generation issues.

This patch also checks for a 'numthreads' attribute when deciding whether to apply the
ConstRef hack.

This closes issue shader-slang#6507 and helps to resolve issue shader-slang#4760.
aleino-nv added a commit to aleino-nv/slang that referenced this issue Mar 4, 2025
…wring

For shaders like tests/compute/simple.slang, which have a 'numthreads' attribute but no
'shader' attribute, the old compile request API would add an EntryPointAttribute to the
AST node of the entry point. However, the new API doesn't, and so a certain ConstRef hack
doesn't get applied when using the new API, leading to subsequent code generation issues.

This patch also checks for a 'numthreads' attribute when deciding whether to apply the
ConstRef hack.

This closes issue shader-slang#6507 and helps to resolve issue shader-slang#4760.
aleino-nv added a commit that referenced this issue Mar 5, 2025
* Add cuda codegen bug repro

This just compiles tests/compute/simlpe.slang for PTX with the new compilation API, in
order to reproduce a code generation bug.

* Detect entrypoint more robustly when applying ConstRef hack during lowring

For shaders like tests/compute/simple.slang, which have a 'numthreads' attribute but no
'shader' attribute, the old compile request API would add an EntryPointAttribute to the
AST node of the entry point. However, the new API doesn't, and so a certain ConstRef hack
doesn't get applied when using the new API, leading to subsequent code generation issues.

This patch also checks for a 'numthreads' attribute when deciding whether to apply the
ConstRef hack.

This closes issue #6507 and helps to resolve issue #4760.

* Add expected failure list for GitHub runners

Our GitHub runners don't have the CUDA toolkits installed, so they can't run all tests.
@aleino-nv
Copy link
Collaborator Author

Fixed by #6506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
goal:quality & productivity Quality issues and issues that impact our productivity coding day to day inside slang kind:bug something doesn't work like it should
Projects
None yet
Development

No branches or pull requests

1 participant