Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change SPIR-V lowering of switch with non-trivial fallthrough to avoid undefined reconvergence behavior #6441

Open
llvm-beanz opened this issue Feb 24, 2025 · 12 comments
Labels
goal:quality & productivity Quality issues and issues that impact our productivity coding day to day inside slang kind:bug something doesn't work like it should

Comments

@llvm-beanz
Copy link

Given this shader:

struct Contents {
  int X;
};

RWStructuredBuffer<Contents> value;

[numthreads(4, 1, 1)]
void main(uint3 threadID : SV_DispatchThreadID) {
  uint sum = 0;
  switch (value[threadID.x].X) {
    case 0:
      sum += WaveActiveSum(1);
      // Intentional fallthrough!
    default:
      sum += WaveActiveSum(10);
      break;
  }
  value[threadID.x].X = sum;
}

Compiler Explorer

Slang currently generates the switch statement as an OpSwitch instruction which has undefined behavior for reconvergence of cases with fall through.

Is this expected to be undefined behavior in Slang?

Additional Context from HLSL

The HLSL draft specification states in section [Intro.Model.Restrictions]:

An optimizing compiler may not optimize code generation such that it changes the behavior of a well-formed program except in the presence of implementation-defined or unspecified behavior.

The presence of Wave, Quad, or Thread Group operations may further limit the valid transformations of a program. Specifically, control flow operations which result in changing which Lanes, Quads, or Waves are actively executing are illegal in the presence of cooperative operations if the optimization alters the behavior of the program.

This language is specifically intended to disallow restructuring of control flow and to force reconvergence at the earliest possible point.

We're tracking an alternate SPIR-V lowering for switch statements in llvm/llvm-project#112056.

@csyonghe
Copy link
Collaborator

If SPIRV doesn't provide a well-defined semantics for fallthrough cases, we will have to leave the semantics of switch undefined for now in Slang. It feels too costly to not use OpSwitch because switch statements exist for a performance reason, and not just a convenient alternative to nesting if.

We could probably improve this by duplicating SPIRV code for cases that has fallthrough, but we will not be lowering all switch to if.

@tangent-vector
Copy link
Contributor

Developers have an intuitive notion that a C-like switch construct should map to the SPIR-V OpSwitch, and to do anything different would violate the "principle of least surprise."

Our default disposition for Slang is currently to map high-level-language constructs to their "natural" equivalent in a target language whenever that is possible/practical. This is especially true when it is possible that using the natural/obvious mapping might afford compilers downstream from Slang (e.g., in GPU drivers) more opportunities for optimization, or enable those compilers to do less work in discovering opportunities for such optimization.

Of course we also need to be concerned about giving developers enough of a foundation that they can write portable code that is also semantically meaningful. If there is a significant body of evidence that the performance cost of mapping switch constructs differently is not that bad, and/or that the consequences of leaving the divergence/reconvergence behavior of switch constructs under-constrained are severe, then we could be convinced to change our plans.

For now, the intended divergence/reconvergence guarantees for switches in Slang is intended to match SPIR-V's OpSwitch.

@llvm-beanz
Copy link
Author

You all get to make decisions how you see fit, but this logic seems confused to me.

Developers have an intuitive notion that a C-like switch construct should map to the SPIR-V OpSwitch, and to do anything different would violate the "principle of least surprise."

We polled users and found that overwhelmingly they assume switch statements reconverge at fallthrough. I don't think most developers writing shaders are experts at either SPIRV or DXIL, certainly not to the degree to assume that switch means OpSwitch or to understand the ins and outs of OpSwitch having undefined reconvergence behavior.

This will be an area of difference between HLSL and Slang going forward since HLSL is giving users a defined behavior for this case.

@csyonghe
Copy link
Collaborator

csyonghe commented Feb 25, 2025

I tried @llvm-beanz's test shader and Slang is producing invalid SPIRV. We have more fundamental issues regarding fallthrough switch support.

The latest thinking is that we can continue to lower all switches without fallthrough as OpSwitch, and for switches with non-trivial fallthroughs, we need to break them into chunks, and emit an outer level of switch + standalone ifs for each fallthrough case. The outer switch will contain one case target block for each break point in the original swich. This should provide the convergence guarantee without giving in to a full fallback to cascading/chained ifs.

@csyonghe
Copy link
Collaborator

Basically, for

switch (value)
{
case 0:
      x();
case 1:
     y();
case 2:
     z();
     break;
default:
      w();
      break;
}

we can lower into something like:

switch (value)
{
case 0:
case 1:
case 2:
     if (value == 0)
          x();
     if (value == 1)
          y();
     z();
     break;
default:
      w();
      break;
}

@llvm-beanz
Copy link
Author

The lowering we intend to implement in Clang is something like the example below.
Given this source:

switch(selector) {
  default:
    something();
  case 1:
  case 3:
    somethingElse();
    break;
  case 2:
    anotherThing();
    break;
}

It should be accurate to expand it to:

do {
  if (!(selector == 1 && selector == 2 && selector == 3)) {
    something();
    selector = 1; // This case falls through so we now need to execute case 1.
  }
  if (selector == 1) {
    somethingElse();
    break; // breaks can still be breaks;
  }
  if (selector == 2) {
    anotherThing();
    break;
  }
} while(false)

This has the advantage of generating deterministic reconvergence.

Generally the optimizations that a backend could perform with OpSwitch provide little benefit over what can be done with a traditional branch, and it has been our opinion that the undefined behavior caused by the SPIRV OpSwitch is a larger drawback than any potential performance gain.

@csyonghe
Copy link
Collaborator

For the example in llvm/llvm-project#112056:

switch(selector) {
  default:
    something();
  case 1:
  case 3:
    somethingElse();
    break;
  case 2:
    anotherThing();
    break;
}

We can lower as:

outer: switch(selector) {
  default:
  case 1:
  case 3:
    something();
    switch(selector)
    {
    case 1:
    case 3:
            somethingElse();
            break outer;
    }
    break;
  case 2:
    anotherThing();
    break;
}

@csyonghe
Copy link
Collaborator

OpSwitch has well defined reconvergence behavior when there is only trivial fallthrough, so it may still be the better option here.

The reason to prefer switch is that it enables O(1) implementation of branching. Consider the case where you have 100+ or even 1000+ cases, switch will allow this to be implemented as a single lookup/address arithmetic to get the jump target, instead of running 1000 nested branches.

@tangent-vector
Copy link
Contributor

@csyonghe I don't think your example above is correct, although I agree with you on the basic approach that can be taken.

Given your input:

switch(selector) {
  default:
    something();
  case 1:
  case 3:
    somethingElse();
    break;
  case 2:
    anotherThing();
    break;
}

I believe the correct output is:

switch(selector) {
  default:
  case 1:
  case 3:
    switch(selector)
    {
    default:
        something();
        break;
    case 1:
    case 3:
        break;
    }
    somethingElse();
    break;
  case 2:
    anotherThing();
    break;
}

(And then the redundant case 1 and case 3 that are grouped with default in the outer switch can be eliminated)

@tangent-vector
Copy link
Contributor

@llvm-beanz In your example expansion, introducing the do-while loop to make breaks work as intended would change the meaning of any continues under the switch. of course at the CFG level that would just mean that you turned those continues into what amount to multi-level continues, and need a pass to eliminate those when emitting an IR that only allows single-level break/continue.

@llvm-beanz
Copy link
Author

Yes, to avoid using OpSwitch you either have additional lowering logic for break or continue.

In either case we believe that the complexity in the front end is required to deterministically meet user expectations for behavior.

@tangent-vector
Copy link
Contributor

Yong and I have discussed this issue offline and come to the conclusion that Slang should change its lowering of switch statements with non-trivial fallthrough so that it avoid the nondeterministic reconvergence behavior in SPIR-V.

We will still need to perform a more thorough audit of all of the targets that Slang supports to determine what, if any, guarantees we are able to provide from the high-level-language compiler about divergence/reconvergence behavior across targets.

@tangent-vector tangent-vector added kind:bug something doesn't work like it should priority:low labels Feb 26, 2025
@tangent-vector tangent-vector changed the title What is the expected convergence behavior for slang switch statements with fallthrough? Change SPIR-V lowering of switch with non-trivial fallthrough to avoid undefined reconvergence behavior Feb 26, 2025
@bmillsNV bmillsNV added this to the Q2 2025 (Spring) milestone Feb 27, 2025
@bmillsNV bmillsNV added the goal:quality & productivity Quality issues and issues that impact our productivity coding day to day inside slang label Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
goal:quality & productivity Quality issues and issues that impact our productivity coding day to day inside slang kind:bug something doesn't work like it should
Projects
None yet
Development

No branches or pull requests

4 participants