Slang takes ~30x as long as shaderc to compile a simple compute shader #6358

NBickford-NV · 2025-02-14T01:33:36Z

Hi Slang team! I've been running into some issues affecting hot-reload workflows, where re-compiling small shaders is common.

The ToT version of Slang (as of 944c19b) takes 48-49 ms on my Windows computer to compile the following 841 bytes of source to SPIR-V. This imports no modules, does not include Slang global session creation time or I/O time, uses optimization level 0, and is averaged over 128 runs. I've included a benchmark you can use to reproduce this issue; more information about it below.

// shader.slang
struct PushConstantCompute
{
  uint64_t bufferAddress;
  uint     numVertices;
};

struct Vertex
{
  float3 position;
};

[[vk::push_constant]]
ConstantBuffer<PushConstantCompute> pushConst;

[shader("compute")]
[numthreads(256, 1, 1)]
void main(uint3 threadIdx : SV_DispatchThreadID)
{
  uint index = threadIdx.x;

  if(index >= pushConst.numVertices)
    return;

  Vertex* vertices = (Vertex*)pushConst.bufferAddress;

  float angle = (index + 1) * 2.3f;

  float3 vertex = vertices[index].position;

  float cosAngle = cos(angle);
  float sinAngle = sin(angle);
  float3x3 rotationMatrix = float3x3(
    cosAngle, -sinAngle, 0.0,
    sinAngle,  cosAngle, 0.0,
         0.0,       0.0, 1.0
  );

  float3 rotatedVertex = mul(rotationMatrix, vertex);

  vertices[index].position = rotatedVertex;
}

The options and targets used in my benchmark are:

    m_options = {{slang::CompilerOptionName::EmitSpirvDirectly, {slang::CompilerOptionValueKind::Int, 1}},        //
                 {slang::CompilerOptionName::VulkanUseEntryPointName, {slang::CompilerOptionValueKind::Int, 1}},  //
                 {slang::CompilerOptionName::Optimization, {slang::CompilerOptionValueKind::Int, 0}}};
    m_targets = {slang::TargetDesc{.format = SLANG_SPIRV, .profile = m_globalSession->findProfile("spirv_1_6")}};

In comparison, shaderc (using shaderc_shared from the 1.4.304.0 Vulkan SDK) compiles the GLSL equivalent in ~1.6 ms, about 30 times as quickly:

// shader.comp.glsl
#version 460
#extension GL_EXT_buffer_reference2 : require
#extension GL_EXT_scalar_block_layout : require
#extension GL_EXT_shader_explicit_arithmetic_types : require

layout(local_size_x = 256, local_size_y = 1, local_size_z = 1) in;

struct PushConstantCompute
{
  uint64_t bufferAddress;
  uint     numVertices;
};

layout(push_constant, scalar) uniform PushConsts
{
  PushConstantCompute pushConst;
};

struct Vertex
{
  vec3 position;
};

layout(buffer_reference, scalar) buffer VertexBuffer
{
  Vertex data[];
};

void main()
{
  uint index = gl_GlobalInvocationID.x;

  if(index >= pushConst.numVertices)
    return;

  VertexBuffer vertices = VertexBuffer(pushConst.bufferAddress);

  float angle = (index + 1) * 2.3f;

  vec3 vertex = vertices.data[index].position;

  float cosAngle = cos(angle);
  float sinAngle = sin(angle);
  mat3 rotationMatrix = mat3(
    cosAngle, -sinAngle, 0.0,
    sinAngle,  cosAngle, 0.0,
         0.0,       0.0, 1.0
  );

  vec3 rotatedVertex = rotationMatrix * vertex;

  vertices.data[index].position = rotatedVertex;
}

The generated SPIR-V files are similar, although shaderc's is slightly larger.

I've put together a benchmark at https://github.com/NBickford-NV/slang-compile-timer to test this under controlled conditions. It first initializes each shader compiler, then times how long it takes to compile a shader to SPIR-V 128 times and averages the results. (Varying the number of repetitions doesn't change the result much, so the first compilation isn't significantly more expensive).

To build the benchmark (currently only tested on Windows), run:

git clone --recursive https://github.com/NBickford-NV/slang-compile-timer.git
cd slang-compile-timer
mkdir cmake_build
cd cmake_build
cmake ..
cmake --build . --parallel

I've included a Release binary compiled using Visual Studio 2022 17.12.3.

Then to benchmark Slang, run ./slang-compile-timer shader.slang:

Loaded shader.slang; 841 bytes.
Compiler initialization time: 262.936300 ms
Compiling 128 times...
Repetition 1
Repetition 2
Repetition 4
Repetition 8
Repetition 16
Repetition 32
Repetition 64
Repetition 128
Average compilation time: 48.232041 ms
SPIR-V output is 1512 bytes long.

And to benchmark shaderc, run ./slang-compile-timer --shaderc shader.comp.glsl:

Loaded shader.comp.glsl; 1117 bytes.
Compiler initialization time: 0.060100 ms
Compiling 128 times...
Repetition 1
Repetition 2
Repetition 4
Repetition 8
Repetition 16
Repetition 32
Repetition 64
Repetition 128
Average compilation time: 1.542900 ms
SPIR-V output is 2568 bytes long.

Thank you!

package.zip

The text was updated successfully, but these errors were encountered:

csyonghe · 2025-02-14T21:11:53Z

Thanks for providing this benchmark, we will look into this issue.

csyonghe · 2025-02-19T01:57:52Z

I am able to improve the performance quite a bit in this benchmark in #6396, but I should also note that Slang will never be as fast as GLSL compilers, similar to how C++ can never be as fast as a C compiler, due to the more powerful type system and flexible compiler architecture.

There might still be 1 or things we can do from here to get another 20-30% speedup, but it is unlikely we are able to get any better performance in these small examples.

Note that there are a lot of components in the compiler that incurs a flat cost at the beginning that is supposed to be amortized out when compiling large code. The advanced type system in Slang often allows users to write more compact, generic code so all of these will help with the compile time when handling more complex application code.

In particular, Slang allows you to precompile modules into .slang-module files, so you never need to reparse the same module twice. In this example, if you first convert .slang to .slang-module file, and then generate code from there, you will be able to by-pass the front-end entirely and get much shorter compile times.

NBickford-NV · 2025-02-20T07:22:31Z

Thank you @csyonghe! I'll build #6396 and verify the performance improvement.

I had some ideas around further optimizations (e.g. I saw malloc/free and dynamic_casts relatively high on the list when I ran a performance profile, so I was thinking about looking at mimalloc and the assembly MSVC's generating for dynamic_cast), but I'll need to re-run the profile to see if that's still the case.

Just to check my understanding, the Slang module system wouldn't help for small shaders like this, right?

csyonghe · 2025-02-20T08:32:23Z

If compile time is of concern, the idea is to always precompile all .slang files into slang-modules before your application starts,
and then make use of link-time specialization instead of preprocessor based specialization to completely remove type checking from the application runtime.

https://shader-slang.org/slang/user-guide/link-time-specialization.html

NBickford-NV · 2025-03-01T02:01:45Z

Thank you @csyonghe ! I now get an average compilation time of 14-16 ms on bca772c , ~3x as fast as before. That's a good performance improvement!

In case it helps, the flame graph I get for this looks like this:

And the bottom-up view of time spent inside each function, excluding subfunctions:

NBickford-NV · 2025-03-01T04:17:45Z

I took a quick look into dynamicCast -- looks like the codegen there is OK (although moving IRInst::getOperands() into the header since LTO didn't inline it improves performance by about 4%, from 14.5 ms to 13.8 ms on my system), the main issue is that dynamicCast is called a lot -- 275041 times per compilation. 73125 or 26% of these return at the nullptr check; 62700 or 23% of these return because T::isaImpl() succeeds; all but 88 or 0.03% return at the final nullptr, although the unwrap is checked 137244 times. So the most major performance improvements will probably be algorithmic.

csyonghe · 2025-03-01T06:57:45Z

Yes, this is consistent with the profiling results I've been seeing, and there is no obvious bottleneck in the system.

I am not identifying any low hanging fruit here that will significantly improve performance from the current state.

Since it is not clear if there are any actionable item, would you mind if we close this issue? I think our users can avoid paying front-end cost anyways if they can architect their code to use Slang modules.

NBickford-NV · 2025-03-01T06:59:57Z

Sure, this can be closed; I feel like there's probably more to find here (should compiling a file this small require 275K dynamic casts?), but this speedup is good to see. I'll create another issue for the larger-scale benchmark testing out modules if I find issues there. Thanks!

csyonghe · 2025-03-01T07:16:22Z

I agree that there is definitely more optimizations there, but it is unlikely that a single change or fix is going to make thing significantly different, and ROI will be diminishing.

Just to show why the type system is complex, here is an example of what needs to happen when checking a simple x+1 expression:

Check the type of x, and find it to be float.
Lookup for all overloads of +, there are like 30+ overloads in standard library for operator +.
For generic operator+ candidates, check if the generic can be specialized with the operand types. Which means building the type inheritance list of all argument types, a complicated process that needs to take into account all potential extensions that may apply to float.
After all that is done, compute the type coercion cost for each candidate, and pick the best option.

Step 3 there can spawn a lot of additonal checks, because Slang allow things like:

extension<T: IInterface1> T : IInterface2 {}

That makes all types conform to IInterface1 also conform to IInterface2, so if we have something like:

T operator+<T:IInterface2>(T, T)

To know if this candidate is applicable, we need not only know if an argument type is IInterface1, but also need to know if any extensions exist to make it conform to IInterface2, etc.

Compared to GLSL which does not have generics, the checking step will be much simpler.

There are certainly more optimizations we can do algorithmically to make things fast, in fact Slang already uses some caches to hold checked subtype relationships and operator overload resolution results, but there can be more opportunities for optimization.

But this is just to give you an idea of the complexity of the type system, and hopefully it explains why Slang's type checking is going to take more time than the GLSL compiler.

csyonghe · 2025-03-01T07:19:50Z

I am going to close the issue now, but I am happy to work with anyone who are interested in optimizing the compiler to see if there are more we can do here.

csyonghe self-assigned this Feb 14, 2025

csyonghe added this to the Q1 2025 (Winter) milestone Feb 14, 2025

csyonghe added the goal:quality & productivity Quality issues and issues that impact our productivity coding day to day inside slang label Feb 14, 2025

csyonghe mentioned this issue Feb 19, 2025

Improve performance when compiling small shaders. #6396

Merged

csyonghe mentioned this issue Feb 20, 2025

Simplify implicit cast ctors for vector & matrix. #6408

Merged

csyonghe closed this as completed Mar 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slang takes ~30x as long as shaderc to compile a simple compute shader #6358

Slang takes ~30x as long as shaderc to compile a simple compute shader #6358

NBickford-NV commented Feb 14, 2025 •

edited

Loading

csyonghe commented Feb 14, 2025

csyonghe commented Feb 19, 2025

NBickford-NV commented Feb 20, 2025

csyonghe commented Feb 20, 2025

NBickford-NV commented Mar 1, 2025

NBickford-NV commented Mar 1, 2025

csyonghe commented Mar 1, 2025

NBickford-NV commented Mar 1, 2025

csyonghe commented Mar 1, 2025

csyonghe commented Mar 1, 2025

Slang takes ~30x as long as shaderc to compile a simple compute shader #6358

Slang takes ~30x as long as shaderc to compile a simple compute shader #6358

Comments

NBickford-NV commented Feb 14, 2025 • edited Loading

csyonghe commented Feb 14, 2025

csyonghe commented Feb 19, 2025

NBickford-NV commented Feb 20, 2025

csyonghe commented Feb 20, 2025

NBickford-NV commented Mar 1, 2025

NBickford-NV commented Mar 1, 2025

csyonghe commented Mar 1, 2025

NBickford-NV commented Mar 1, 2025

csyonghe commented Mar 1, 2025

csyonghe commented Mar 1, 2025

NBickford-NV commented Feb 14, 2025 •

edited

Loading