Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote bitfieldExtract and bitfieldInsert to become Slang intrinsics #5020

Closed
wants to merge 30 commits into from

Conversation

natevm
Copy link
Contributor

@natevm natevm commented Sep 6, 2024

For context, see the related issue here

In short, bitfield insertion and extraction are useful tools for reinterpreting structures where types have differing sizes, as they allow for simpler and more hardware-efficient instructions for byte manipulation.

Before this change, bitfieldExtract and bitfieldInsert were only supported outside of the compiler, when using -allow-glsl.

Here, I'm introducing KIROp_BitfieldExtract and KIOROp_BitfieldInsert instructions, which by default generate the original fallback bitfield manipulation logic but in CLikeSourceEmitter.

Then, when emitting SPIR-V or GLSL, these compilation paths override this default logic to instead tap into more specialized intrinsics.

@csyonghe csyonghe added the pr: non-breaking PRs without breaking changes label Sep 6, 2024
Copy link
Collaborator

@csyonghe csyonghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to come with a test.

source/slang/core.meta.slang Outdated Show resolved Hide resolved
source/slang/slang-emit-c-like.cpp Outdated Show resolved Hide resolved
@natevm
Copy link
Contributor Author

natevm commented Sep 6, 2024

This needs to come with a test.

Agreed. Looks like the current test suite is passing which is good. I’ll see if I can write something up which tests on a couple different target backends.

@natevm
Copy link
Contributor Author

natevm commented Sep 8, 2024

@csyonghe What's the correct approach to constructing vectors for the CUDA backend?

bitfieldExtract is a component-wise intrinsic, so I need to support a function like this as a right hand side which can be assigned to the left:

uint4 bfe(uint4 val, int off, int bits) {
    return ((val >> off) & ((1u << bits) - 1));
}

However, every time I try to construct an int4 from off, I get errors similar to the following:

nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "unsigned int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"

I've tried constructing an int4 as int4(off, off, off, off) but this does not work either.

I could try emitting a make_int4 as a special case for CUDA, but this doesn't generalize easily to uint8_t, uint16_t, or uint64_t. (I suppose I could make a lookup table… but that seems a bit heavy handed…)

@natevm
Copy link
Contributor Author

natevm commented Sep 9, 2024

For the time being, looks like a special case lookup table works for CUDA.

@natevm
Copy link
Contributor Author

natevm commented Sep 9, 2024

On adding these tests, I noticed that the current logic in glsl.meta.slang for bitfield extraction in the signed integer case was incorrect. For signed integers, the extracted bits are expected to be sign-extended, but the fallback for this wasn’t doing sign extension.

The current logic now correctly accounts for this, and I have some explicit tests to back that up now too.

@@ -0,0 +1,56 @@
//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-cpu -output-using-type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to cover all the other targets, with this line duplicated where -cpu is replaced with -vk, -d3d, -metal and -cuda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add these in and see if the existing tests pass. We might also want to consider adding overrides for metal and cuda if these languages have direct bitfield extraction intrinsics.

For CUDA, at least for scalars, we could be using the following PTX:

unsigned int bitfieldExtract(unsigned int val, int pos, int len) { 
    unsigned int r; asm("bfe.u32 %0, %1, %2, %3;" : "=r"(r) : "r"(val), "r"(pos), "r"(len)); 
    return r; 
}
unsigned int bitfieldInsert(unsigned int src, unsigned int dst, int pos, int len) { 
    unsigned int r; 
    asm("bfi.b32 %0, %1, %2, %3, %4;" : "=r"(r) : "r"(src), "r"(dst), "r"(pos), "r"(len)); 
    return r; 
}

We'd need to add a .s32 for the sign extension case, but that's not bad. I can look into a solution for Metal as well, but hopefully in the meantime the c-like fallback will work there too.

@natevm
Copy link
Contributor Author

natevm commented Sep 9, 2024

Looks like there are some newly failing tests on my end after adding these flags for the other targets. I'll see if I can figure out what's going on.

For i32 extract

  • using -cpu seems to be throwing error : invalid operands to binary expression ('Vector<int32_t, 4>' (aka 'Vector<int, 4>') and 'int32_t' (aka 'int'))
    -- Fixed, was missing a vector type for cpu targets
  • all other targets pass

For i64 extract

  • Using -cpu again throws similar invalid operand error.
    -- Fixed, was missing a vector type for cpu targets
  • Using -vk throws unexpected type given to bitfieldExtract in SPIR-V emit. I might have forgotten to ensure 64-bit values work in the SPIR-V specific logic...

The same failure patterns occur for i32/64 insertion. I'm not sure yet about for metal. Hopefully the CI will give us some answers there.

@natevm
Copy link
Contributor Author

natevm commented Sep 10, 2024

I'm a bit confused with the Vulkan spec... I think I might have found a typo?...

According to the spec:

The Base operand of any OpBitCount, OpBitReverse, OpBitFieldInsert, OpBitFieldSExtract, or OpBitFieldUExtract instruction must be a 32-bit integer scalar or a vector of 32-bit integers.

This appears even if I require SpvCapabilityInt64. At least with NVIDIA, I'm finding that the tests here reveal that 64-bit integers are supported, and do give correct results. But evidently there is ambiguity on what exactly the shaderInt64 capability means.

@natevm
Copy link
Contributor Author

natevm commented Sep 10, 2024

It appears the two outstanding issues are:

  1. When SLANG_RUN_SPIRV_VALIDATION=1, SPIR-V validation does not correctly consider how SpvCapabilityInt64 extends the capabilities of OpBitfieldSExtract/OpBitfieldUExtract/OpBitfieldInsert intrinsics. This is causing the one windows CI build to fail. Setting SLANG_RUN_SPIRV_VALIDATION=0 and ignoring the validation errors, I get correct results.

This appears to be a regression caused by this PR from Samsung: KhronosGroup/SPIRV-Tools#4758. Apparently their hardware does not support 64-bit values for certain intrinsics.

  1. For Metal, it appears that when targetting release mode, Slang appears to be translating my vec<int64_t, 4> Metal type into int64_t4, which is not an existing type in the Metal shading language.

So for 1, unfortunately because of some restrictions with Samsung's HW, there is no way to really tap into the fast path HW on NVIDIA architectures for non-32-bit types with bitfield extraction / insertion. So I'll have to think about some workaround for non-32-bit types...

For 2, this seems like a bug with the current Slang Metal backend, and isn't directly tied to my changes as far as I can tell.

@csyonghe For 2, do you know what's causing the release build on MacOS to fail with vec4(int64_t, N), but the debug build on MacOS to pass?

@csyonghe
Copy link
Collaborator

You can add -skip-spirv-validation in the //TEST line to disable validation for that specific test. You can search for it in the codebase to see how it is used elsewhere for similar situations.

@csyonghe
Copy link
Collaborator

Only release build runs the full suite of tests, debug build doesn't run most tests.

@natevm
Copy link
Contributor Author

natevm commented Sep 12, 2024

You can add -skip-spirv-validation in the //TEST line to disable validation for that specific test. You can search for it in the codebase to see how it is used elsewhere for similar situations.

I did try to use this flag like so:
//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-vk -skip-spirv-validation

But this throws an error that -skip-spirv-validation is not a valid command line argument. Perhaps it's because I'm using this TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK) pattern?

The other test using this flag follows a different pattern:
//TEST:SIMPLE(filecheck=SPIRV):-target spirv -entry computeMain -stage compute -emit-spirv-directly -skip-spirv-validation

@csyonghe
Copy link
Collaborator

For compute tests, you need -slang -skip-spirv-validation.

@natevm
Copy link
Contributor Author

natevm commented Sep 13, 2024

For 2, this seems like a bug with the current Slang Metal backend, and isn't directly tied to my changes as far as I can tell.

@csyonghe For 2, do you know what's causing the release build on MacOS to fail with vec4(int64_t, N), but the debug build on MacOS to pass?

@csyonghe looks like this is the last major issue at the moment. I don't have a Mac, so I can't reproduce this locally to debug it. Looks like 64-bit integer vectors broken in general in the Metal backend. Perhaps we should file an issue to resolve that.

@csyonghe
Copy link
Collaborator

You should be able to fix this by changing the way we emit vector types when the element type is 64bit int in slang-emit-metal.cpp, look for kIROp_VectorType in that file to find the place to emit the type.

You can test this locally on a windows machine by adding a
//TEST:SIMPLE:-target metal
On the offending test file to trigger a compile only test with the source, that will run the metal compiler on windows to give you any metal compile errors without actually running the shader.

@natevm
Copy link
Contributor Author

natevm commented Sep 15, 2024

@csyonghe looks like all relevant tests are passing.

There’s one windows CI machine which is reporting that bash is missing, but I see the same error on ToT, so I’m assuming that’s okay to see.

Copy link
Contributor Author

@natevm natevm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick pass over braces

source/slang/slang-emit-c-like.cpp Outdated Show resolved Hide resolved
source/slang/slang-emit-glsl.cpp Outdated Show resolved Hide resolved
source/slang/slang-emit-glsl.cpp Outdated Show resolved Hide resolved
source/slang/slang-emit-spirv.cpp Outdated Show resolved Hide resolved
source/slang/slang-emit-spirv.cpp Outdated Show resolved Hide resolved
m_writer->emitRawText(std::to_string(N).c_str());
}
// Special handling required for Metal target
else if (isMetalTarget(getTargetReq()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this, and instead we should make sure the metal backend can emit all vector types correctly.

return true;
}

void CLikeSourceEmitter::emitVecNOrScalar(IRVectorType* vectorType, std::function<void()> emitComponentLogic)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming suggestion: maybeEmitMakeVectorFromScalar. Instead of doing call-back style with std::function (which is a virtual function call), just use bool maybeEmitMakeVectorFromScalar(IRType* type); and void maybeCloseMakeVectorFromScalar(bool v).

@@ -538,7 +538,8 @@ bool MetalSourceEmitter::tryEmitInstExprImpl(IRInst* inst, const EmitOpInfo& inO

void MetalSourceEmitter::emitVectorTypeNameImpl(IRType* elementType, IRIntegerValue elementCount)
{
emitSimpleTypeImpl(elementType);
// NM: Passing count here, as Metal 64-bit vector type names do not match their scalar equivalents.
emitSimpleTypeKnowingCount(elementType, elementCount);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can special case the 64bit types in the switch statement below, and emit vec<T, N> instead of TN for those types. We shouldn't need the emitSimpleTypeKnowingCount() change.

@natevm
Copy link
Contributor Author

natevm commented Sep 28, 2024

Apologies on the delay here. Been a bit tied up with a move between states. I should have a bit more time to work on this in the upcoming week.

@CLAassistant
Copy link

CLAassistant commented Nov 25, 2024

CLA assistant check
All committers have signed the CLA.

@natevm
Copy link
Contributor Author

natevm commented Dec 8, 2024

@kaizhangNV Going to try digging into this today. Sounds like there's not much left needed to get this done.

@natevm natevm requested a review from a team as a code owner December 8, 2024 21:10
@kaizhangNV
Copy link
Contributor

It looks like the main concern for metal is addressed already.
The only thing remaining is #5020 (comment)

@kaizhangNV
Copy link
Contributor

@csyonghe This PR is in a good shape. In the latest change, I see Nate already addressed the special handling for metal backend. Do you have any other concerns except the naming suggestion? I feel it's not worth for me to take over this as I probably will have to create a new PR.

switch (bitWidth)
{
case 8:
one = "uint8_t(1)";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be correct for wgsl backend. The int type names there are different (i.e. i32, u32 etc.), we should call emitType instead.

//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-dx12 -use-dxil
//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-mtl
//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-cpu
//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-cuda
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we testing on wgpu?


const IntInfo i = getIntTypeInfo(elementType);

if (i.width == 64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be done when we emit the 16/64bit int type itself, so not necessary here.

@csyonghe
Copy link
Collaborator

@kaizhangNV The remaining work here is to make sure it works on WGPU. @natevm do you have time for this or do you want @kaizhangNV to take over?

@csyonghe
Copy link
Collaborator

@kaizhangNV has already landed this change in another PR, so closing this one.

@csyonghe csyonghe closed this Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr: non-breaking PRs without breaking changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants