Skip to content

Commit 1b89f78

Browse files
authored
Capabilities System, CapabilitySet Logic Overhaul (shader-slang#4145)
* Capabilities System, Backing Logic Overhaul Fixes shader-slang#4015 Problems to address: 1. Currently the capabilities system spends anywhere from 25-50% of compile time on the CapabilityVisitor. Most of this time is spent on join logic: 1. Finding abstract atoms 2. Comparing list1<->list2. This should and can be made significantly faster. 2. Error system does not produce errors with auxiliary information. This will require a partial redesign to provide more useful semantic information for debugging. What was addressed: 1. Array backed `CapabilityConjunctionSet` was replaced in-favor for a `UIntSet` backed `CapabilityTargetSets`. The design is described below. Design: * `CapabilityTargetSets` is a `Dictionary<targetAtom, CapabilityTargetSet>`. This is not an array for 2 reasons: 1. Easy to figure out which target is missing between two `CapabilityTargetSets` 2. To statically allocate an array requires the preprocessor to manually annotate which Capability is a target and link that Capability to an index. This means a dictionary is required for lookup regardless of implementation. * `CapabilityTargetSet` is an intermediate representation of all capabilities for a singular `target` atom (`glsl`, `hlsl`, `metal`, ...). This structure contains a dictionary to all stage specific capability sets for fast lookup of stage capabilities supported by a `CapabilitySet` for a `target` atom. This reduces number of sets searched. * `CapabilityStageSet` is an intermediate representation of all capabilities for a singular `stage` atom (`vertex`, `fragment`, ...). This structure holds all disjoint capability sets for a `stage`. A disjoint set is rare, but may exist in some scenarios (as an example): `{glsl, EXT_GL_FOO}{glsl, _GLSL_130, _GLSL_150}`. This reduces the number of sets searched. * `UIntSet` is the main reason for the redesign for better performance and memory usage. All set operations only require a few operations, making all set logic trivial and with minimal cost to run. All algorithms were modified to focus around `UIntSet` operations. 2. Errors * Semantic information are now better linked to the calling function to provide a connection of function<->function_body for when saving semantic information for errors. * Missing targets now print errors much like other error code by finding code which could be a cause of incompatibility. What is missing: 1. Add non naive support for non-stage specific capabilities such as `{hlsl, _sm_5_0}`. Currently non stage specific targets emulate the behavior through assigning such capabilities to every stage: `{hlsl, _sm_5_0, vertex} {hlsl, _sm_5_0, fragment}...`. Removal of this behavior would remove redundant shader stage sets being made at construction time (~80% of new implementation runtime). This is an addition, not an overhaul. 2. Optionally: `UIntSet` should be modified to support SIMD operations for significantly faster operations. This is not required immediately since `UIntSet` is already not a performance constraint. Notes: * UIntSet had implementation bugs which were fixed in this PR. * The old capabilities system had bugs which were fixed in this PR when transforming to the new implementation. * fix .natvis debug view * Small optimizations I found while working on the addition the AST building pass looks like so now: 1% = ~capabilitySet 2% = capabilitySet() 1.5% capabilitySet::unionWith() 0.8% capabilitySet::join() 1.5% auxillary info for debugging ~0.5-1% extra visitor overhead ~5% total for the visitor ~6.5% for total runtime costs * fix caps which were wrong but worked * push minor syntax fix (still looking for why other tests fail) * perf & bug fixes 1. did not properly remake isBetterForTarget for this->empty case with that as Invalid. This is best case in this senario. 2. Remade seralizer for stdlib generation. Faster (more direct) & cleaner code. NOTE: did not address review comments * fix glsl.meta caps error * fixing findBest logic again & UIntSet wrapper findBest was not checking for 'more specialized' targets & was element counter was flawed * faster getElements algorithm + natvis for UIntSet + wrong warning * type incompatability of bitscanForward implementations * try to fix warnings again * remove ptr for clang intrinsic * add missing header * ifdef to allow clang compile * compiler hackery to fix up platform/type independent operations * bracket * fix MSVC error * missing template * change types out again * changes to fix compiling * adjustment to parameter for Clang/GCC * added iterator to delay processing all atomSets of a CapabilitySet * add a few missing consts's * ensure we never have more than 1 disjointSet Added a wrapper + assert + union functionality to all possible disjoint sets. This was done in favor of a removal of the LinkedList for 2 reasons: 1. We still need 0-1 set functionality. 2. Might as well keep the code, just disallow the problematic functionality. * address review comments non linked-list refactor review comments addressed; add doc comments + remove redundant code * comments + remove isValid for bool operator * push removal of linkedlist for capabilities * add missing break * address review comments minor adjustments of syntax * push a fix to the `CapabilitySet({shader, missing target})` code * quality + error 1. add iterator to UIntSet 2. do not specialize target_switch if profile is derived from case (GLSL_150 is not compatable with GLSL_400) * fix target_switch erroring + temporarily remove UIntSet::Interator temporarily remove UIntSet::Interator. It will be added after, testing code on CI first so I can multi-task fixing the UIntSet Iterator * fix the UIntSet iterator * Revert "fix the UIntSet iterator" temporarily to pull from master * add metal error as per texture.slang (took a while I realize this was why things were breaking, likely should adjust errors to reflect this) * Rework UIntSet to have a template for output type This is done so it is reasonable to debug the iterator output and not just dealing with messy int's Fix problems with the iterators implemented + invalid capabilities handling * removed incorrect `__target_switch` capability barycentric was being used with anticipation of `profile glsl450`, this does not expand into `GL_EXT_fragment_shader_barycentric`, this instead caused an error which is hidden during cross-compile. * remove some uses of getElements * remove undeclared_stage for now * remove redundant code associated with `undeclared_stage` * remove unused variable * address review specifically to note removed static in a thread dangerous scope. Now using a `const static` for read only (thread safe) which precompile steps generate * move GLSL_150 capdef change to sm_4_1 (more accurate) * address most review comments did not address: shader-slang#4145 (comment) * revert incorrect code review suggestion * push changes for all code review suggestions
1 parent 3b0de8b commit 1b89f78

30 files changed

+1777
-1474
lines changed

source/core/slang-linked-list.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -323,7 +323,7 @@ template <typename T> class LinkedList
323323
}
324324
return rs;
325325
}
326-
int getCount() { return count; }
326+
int getCount() const { return count; }
327327
};
328328
} // namespace Slang
329329
#endif

source/core/slang-uint-set.cpp

+31-31
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,6 @@
33
namespace Slang
44
{
55

6-
static bool _areAllZero(const UIntSet::Element* elems, Index count)
7-
{
8-
for (Index i = 0; count; ++i)
9-
{
10-
if (elems[i])
11-
{
12-
return false;
13-
}
14-
}
15-
return true;
16-
}
17-
186
UIntSet& UIntSet::operator=(UIntSet&& other)
197
{
208
m_buffer = _Move(other.m_buffer);
@@ -49,14 +37,8 @@ void UIntSet::setAll()
4937

5038
void UIntSet::resize(UInt size)
5139
{
52-
const Index oldCount = m_buffer.getCount();
5340
const Index newCount = Index((size + kElementMask) >> kElementShift);
54-
m_buffer.setCount(newCount);
55-
56-
if (newCount > oldCount)
57-
{
58-
::memset(m_buffer.getBuffer() + oldCount, 0, (newCount - oldCount) * sizeof(Element));
59-
}
41+
resizeBackingBufferDirectly(newCount);
6042
}
6143

6244
void UIntSet::clear()
@@ -66,17 +48,7 @@ void UIntSet::clear()
6648

6749
bool UIntSet::isEmpty() const
6850
{
69-
const Element*const src = m_buffer.getBuffer();
70-
const Index count = m_buffer.getCount();
71-
72-
for (Index i = 0; i < count; ++i)
73-
{
74-
if (src[i])
75-
{
76-
return false;
77-
}
78-
}
79-
return true;
51+
return _areAllZero(m_buffer.getBuffer(), m_buffer.getCount());
8052
}
8153

8254
void UIntSet::clearAndDeallocate()
@@ -106,7 +78,7 @@ bool UIntSet::operator==(const UIntSet& set) const
10678

10779
const Index minCount = Math::Min(aCount, bCount);
10880

109-
return ::memcmp(aElems, bElems, minCount) == 0 &&
81+
return ::memcmp(aElems, bElems, minCount*sizeof(Element)) == 0 &&
11082
_areAllZero(aElems + minCount, aCount - minCount) &&
11183
_areAllZero(bElems + minCount, bCount - minCount);
11284
}
@@ -123,6 +95,15 @@ void UIntSet::intersectWith(const UIntSet& set)
12395
}
12496
}
12597

98+
void UIntSet::subtractWith(const UIntSet& set)
99+
{
100+
const Index minCount = Math::Min(this->m_buffer.getCount(), set.m_buffer.getCount());
101+
for (Index i = 0; i < minCount; i++)
102+
{
103+
this->m_buffer[i] = this->m_buffer[i] & (~set.m_buffer[i]);
104+
}
105+
}
106+
126107
/* static */void UIntSet::calcUnion(UIntSet& outRs, const UIntSet& set1, const UIntSet& set2)
127108
{
128109
outRs.m_buffer.setCount(Math::Max(set1.m_buffer.getCount(), set2.m_buffer.getCount()));
@@ -162,5 +143,24 @@ void UIntSet::intersectWith(const UIntSet& set)
162143
return false;
163144
}
164145

146+
Index UIntSet::countElements() const
147+
{
148+
// TODO: This can be made faster using SIMD intrinsics to count set bits.
149+
uint64_t tmp;
150+
constexpr Index loopSize = ((sizeof(Element) / sizeof(tmp)) != 0) ? sizeof(Element) / sizeof(tmp) : 1;
151+
Index count = 0;
152+
for (auto index = 0; index < this->m_buffer.getCount(); index++)
153+
{
154+
for (auto i = 0; i < loopSize; i++)
155+
{
156+
tmp = m_buffer[index] >> (sizeof(tmp) * i);
157+
tmp = tmp - ((tmp >> 1) & 0x5555555555555555);
158+
tmp = (tmp & 0x3333333333333333) + ((tmp >> 2) & 0x3333333333333333);
159+
count += ((tmp + (tmp >> 4) & 0xF0F0F0F0F0F0F0F) * 0x101010101010101) >> 56;
160+
}
161+
}
162+
return count;
163+
}
164+
165165
}
166166

0 commit comments

Comments
 (0)