Skip to content

Commit b17701c

Browse files
authored
Matrix docs update (shader-slang#1815)
* #include an absolute path didn't work - because paths were taken to always be relative. * Update matrix documentation. * Small fixes. * Some small fixes. * Fixes and improvements to matrix doc. * Small fixes. * Additional matrix doc layout clarification.
1 parent 27c18d1 commit b17701c

File tree

1 file changed

+130
-40
lines changed

1 file changed

+130
-40
lines changed

docs/user-guide/a1-01-matrix-layout.md

+130-40
Original file line numberDiff line numberDiff line change
@@ -3,26 +3,141 @@ layout: user-guide
33
---
44

55
Handling Matrix Layout Differences on Different Platforms
6-
============================
6+
=========================================================
77

8-
The differences on default matrix layout or storage conventions between GLSL (OpenGL/Vulkan) and HLSL has been an issue that frequently causes confusion among developers. When writing applications that work on different targets, one important goal that developers frequently seek is to make it possible to pass the same matrix generated by host code to the same shader code, regardless of what graphics API is being used (e.g. Vulkan, OpenGL or Direct3D). As a solution to shader cross-compilation, Slang provides necessary tools for developers navigate around the differences between GLSL and HLSL targets.
8+
The differences between default matrix layout or storage conventions between GLSL (OpenGL/Vulkan) and HLSL has been an issue that frequently causes confusion among developers. When writing applications that work on different targets, one important goal that developers frequently seek is to make it possible to pass the same matrix generated by host code to the same shader code, regardless of what graphics API is being used (e.g. Vulkan, OpenGL or Direct3D). As a solution to shader cross-compilation, Slang provides necessary tools for developers navigate around the differences between GLSL and HLSL targets.
9+
10+
A high level summary:
11+
12+
* Default matrix **layout** in memory for Slang is `column-major`.
13+
* This default is for *legacy* reasons and may change in the future.
14+
* Row-major layout is the only *portable* layout to use across targets (with significant caveats for non 4x4 matrices)
15+
* Use `setMatrixLayoutMode`/`spSetMatrixLayoutMode`/`createSession` to set the default
16+
* Use `-matrix-layout-row-major` or `-matrix-layout-column-major` for the command line
17+
* or via `spProcessCommandLineArguments`/`processCommandLineArguments`
18+
* Depending on your host maths library, matrix sizes and targets, it may be necessary to convert matrices at host/kernel boundary
19+
20+
On the portability issue, some targets *ignore* the matrix layout mode, notably CUDA and CPU/C++. For this reason for the widest breadth of targets it is recommended to use *row-major* matrix layout.
921

1022
Two conventions of matrix transform math
11-
-------------------------
12-
Depending on the platform a developer is used to, a matrix-vector transform can be expressed as either `v*m` (`mul(v, m)` in HLSL), or `m*v` (`mul(m,v)` in HLSL). This convention, together with the matrix layout (column-major or row-major), determines how a matrix should be filled out in host code. One way to make things less ambiguous is to think about where the translation terms should be placed in memory when filling a typical 4x4 transform matrix.
23+
----------------------------------------
24+
25+
Depending on the platform a developer is used to, a matrix-vector transform can be expressed as either `v*m` (`mul(v, m)` in HLSL), or `m*v` (`mul(m,v)` in HLSL). This convention, together with the matrix layout (column-major or row-major), determines how a matrix should be filled out in host code.
26+
27+
In HLSL/Slang the order of vector and matrix parameters to `mul` determine how the *vector* is interpretted. This interpretation is required because a vector does not in as of it's self differentiate between being a row or a column.
28+
29+
* `mul(v, m)` - v is interpretted as a row vector
30+
* `mul(m, v)` - v is interpretted as a column vector.
31+
32+
Through this mechanism a developer is able to write transforms in their preferred style.
33+
34+
These two styles are not directly interchangable - for a given `v` and `m` then generally `mul(v, m) != mul(m, v)`. For that the matrix needs to be transposed so
35+
36+
* `mul(v, m) == mul(transpose(m), v)`
37+
* `mul(m, v) == mul(v, transpose(m))`
38+
39+
This behavior is *independent* of how a matrix layout in memory. Host code needs to be aware of how a shader code will interpret a matrix stored in memory, it's layout, as well as the vector interpretation convention used in shader code (ie `mul(v,m)` or `mul(m, v)`).
40+
41+
[Matrix layout](https://en.wikipedia.org/wiki/Row-_and_column-major_order) can be either `row-major` or `column-major`. The difference just determines which elements are contiguous in memory. `Row-major` means the rows elements are contiguous. `Column-major` means the column elements are contiguous.
42+
43+
Another way to think about this difference is in terms of where translation terms should be placed in memory when filling a typical 4x4 transform matrix. When transforming a row vector (ie `mul(v, m)`) with a `row-major` matrix layout, translation will be at `m + 12, 13, 14`. For a `column-major` matrix layout, translation will be at `m + 3, 7, 11`.
44+
45+
Note it is a *HLSL*/*Slang* convention that the parameter ordering of `mul(v, m)` means v is a *row* vector. A host maths library *could* have a transform function `SomeLib::transform(v, m)` such that `v` is a interpretted as *column* vector. For simplicitys sake the remainder of this discussion assumes that the `mul(v, m)` in equivalent in host code follows the interpretation that `v` is *row* vector.
46+
47+
Discussion
48+
----------
49+
50+
There are four variables in play here:
51+
52+
* Host vector interpretation (row or column) - and therefore effective tranform order (column) `m * v` or (row) `v * m`
53+
* Host matrix memory layout
54+
* Shader vector interpretation (as determined via `mul(v, m)` or `mul(m, v)`
55+
* Shader matrix memory layout
56+
57+
Since each item can be either `row` or `column` there are 16 possible combinations. For simplicity let's reduce the variable space by making some assumptions.
58+
59+
1) The same vector convention will be used in host code as in shader code.
60+
2) The host maths matrix layout is the same as the kernel.
61+
62+
If we accept 1, then we can ignore the vector interpretation because as long as they are consistent then only matrix layout is significant.
63+
If we accept 2, then there are only two possible combinations - either both host and shader are using `row-major` matrix layout or `column-major` layout.
64+
65+
This is simple, but is perhaps not the end of the story. First lets assume that we want our Slang code to be as portable as possible. As previously discussed for CUDA and C++/CPU targets Slang ignores the matrix layout settings - the matrix layout is *always* `row-major`.
66+
67+
Second lets consider performance. The matrix layout in a host maths libray is not arbitrary from a performance point of view. A performant host maths library will want to use SIMD instructions. With both x86/x64 SSE and ARM NEON SIMD it makes a performance difference which layout is used, depending on if `column` or `row` is the *prefered* vector interpretation. If the `row` vector interpretation is prefered, it is most performant to have `row-major` matrix layout. Conversely if `column` vector interpretation is prefered `column-major` matrix will be the most performant.
68+
69+
The performance difference comes down to a SIMD implementation having to do a transpose if the layout doesn't match the prefered vector interpretation.
70+
71+
If we put this all together - best performance, consistency between vector interpretation and platform independence we get:
72+
73+
1) Consistency : Same vector interpretation in shader and host code
74+
2) Platform independence: Kernel uses `row-major` matrix layout
75+
3) Performance: Host vector interpretation should match host matrix layout
76+
77+
The only combination that forfils all aspects is `row-major` matrix layout and `row` vector interpretation for both host and kernel.
78+
79+
It's worth noting that for targets that honor the default matrix layout - that setting can acts like a toggle transposing a matrix layout. That if for some reason the combination of choices leads to inconsistent vector transforms, an implementation can perform this transform in *host* code at the boundary between host and the kernel. This is not the most performant or convenient scenario, but if supported in an implementation it could be used for targets that do not support kernel matrix layout settings.
80+
81+
If only targetting platforms that honor matrix layout, there is more flexibility, our constraints are
1382

14-
If the shader code writes `mul(m, v)`, then the last **column** of `m` defines the translation terms. If we use row-major matrix layout, then the host code should make sure the translation terms are filled in at `m + 4, 7, 11` locations in memory.
83+
1) Consistency : Same vector interpretation in shader and host code
84+
2) Performance: Host vector interpretation should match host matrix layout
1585

16-
Alternatively, if the shader code writes `mul(v, m)`, then the last **row** of `m` defines the translation terms. When using row-major matrix layout, the host code should make sure the translation terms are filled in at `m + 12, 13, 14` locations in memory.
86+
Then there are two combinations that work
1787

18-
By default, Slang assumes all matrices to be in **row-major** layout, since this is the most nature layout to work with in CPU code: each row of the matrix occupies contiguous space in memory. A user should stick to one of the above practices to get correct result. Note that this is different from `fxc` which assumes `column_major` layout by default. As an example, if the host code uses `glm` library to generate transform matrices, the translation terms will be stored in `[12], [13], [14]` locations in memory. Therefore, the shader code should stick to the `mul(v,m)` convention to ensure correctness.
88+
1) `row-major` matrix layout for host and kernel, and `row` vector interpretation.
89+
2) `column-major` matrix layout for host and kernel, and `column` vector interpretation.
1990

20-
Slang automatically handles the convention differences when cross-compiling code to GLSL. For example, a `float3x4` matrix will be translated to `mat4x3` in the resulting GLSL. Correspondingly, `mul(v, m)` will be translated to `m*v` in GLSL. Therefore, as long as the user is sticking to the above practices consistently, they will get correct result with the same matrix value in memory regardless of what graphics API they are actually using.
91+
If the host maths library is not performance orientated, it may be arbitray from a performance point of view if a `row` or `column` vector interpretation is used. In that case assuming shader and host vector interpretation is the same it is only important that the kernel and maths library matrix layout match.
92+
93+
Another way of thinking about these combinations is to think of each change in `row-major`/`column-major` matrix layout and `row`/`column` vector interpretation is a transpose. If there are an *even* number of flips then all the transposes cancel out. Therefore the following combinations work
94+
95+
| Host Vector | Kernel Vector | Host Mat Layout | Kernel Mat Layout
96+
|-------------|---------------|-----------------|------------------
97+
| Row | Row | Row | Row
98+
| Row | Row | Column | Column
99+
| Column | Column | Row | Row
100+
| Column | Column | Column | Column
101+
| Row | Column | Row | Column
102+
| Row | Column | Column | Row
103+
| Column | Row | Row | Column
104+
| Column | Row | Column | Row
105+
106+
To be clear 'Kernel Mat Layout' is the shader matrix layout setting. As previously touched upon, if it is not possible to use the setting (say because it is not supported on a target), then doing a transpose at the host/kernel boundary can fix the issue.
107+
108+
Matrix Layout
109+
-------------
110+
111+
The above discussion is largely around 4x4 32-bit element matrices. For graphics APIs such as Vulkan, GL, and D3D there are typically additional restrictions for matrix layout. One restriction is for 16 byte alignment between rows (for `row-major` layout) and columns (for `column-major` layout).
112+
113+
More CPU-like targets such as CUDA and C++/CPU do not have this restriction, and have all elements are consecutive.
114+
115+
This being the case only the following matrix types/matrix layouts will work across all targets. (Listed in the HLSL convention of RxC).
116+
117+
* 1x4 `row-major` matrix layout
118+
* 2x4 `row-major` matrix layout
119+
* 3x4 `row-major` matrix layout
120+
* 4x4 `row-major` matrix layout
121+
122+
These are all 'row-major' because as previously discussed currently only `row-major` matrix layout works across all targets currently.
123+
124+
NOTE! This only applies to matrices that are trafficed between host and kernel - any matrix size will work appropriately for variables in shader/kernel code for example.
125+
126+
The hosts maths library also plays a part here. The library may hold all elements consecutively in memory. If that's the case it will match the CPU/CUDA kernels, but will only work on 'graphics'-like targets that match that layout for the size.
127+
128+
For SIMD based host maths libraries it can be even more convoluted. If a SIMD library is being used that prefers `row` vector interpretation and therefore will have `row-majow` layout it may for many sizes *not* match the CPU-like consecutive layout. For example a 4x3 - it will likely be packed with 16 byte row alignment. Additionally even if a matrix is packed in the same way it may not be the same size. For example a 3x2 matrix *may* hold the rows consecutively *but* be 16 bytes in size, as opposed to the 12 bytes that a CPU-like kernel will expect.
129+
130+
If a SIMD based host maths library with graphics-like APIs are being used, there is a good chance (but certainly *not* guarenteed) that layout across non 4x4 sizes will match because SIMD typically implies 16 byte alignment.
131+
132+
If your application uses matrix sizes that are not 4x4 across the host/kernel boundary and it wants to work across all targets, it is *likely* that *some* matrices will have to be converted across the boundary. This being the case, having to handle transposing matrices at the boundary is a less significant issue.
133+
134+
In conclusion if your application has to perform matrix conversion work at the host/kernel boundary the previous observation about "best performance" implies `row-major` layout and `row` vector interpretation becomes somewhat mute.
21135

22136
Overriding default matrix layout
23-
--------------------------
137+
--------------------------------
138+
139+
Slang allows users to override default matrix layout with a compiler flag. This compiler flag can be specified during the creation of a `Session`:
24140

25-
While we do not recommend so, Slang allows users to override default matrix layout with a compiler flag. This compiler flag can be specified during the creation of a `Session`:
26141
```
27142
slang::IGlobalSession* globalSession;
28143
...
@@ -33,37 +148,12 @@ slang::ISession* session;
33148
globalSession->createSession(slangSessionDesc, &session);
34149
```
35150

36-
This make make Slang treat all matrices as in column-major layout, and emit `column_major` qualifier in resulting code.
37-
38-
Note that if you choose to use column-major layout, you either need to flip the matrix multiplication order in shader code or fill in the matrix in transpose order in host code.
39-
40-
Summary
41-
-------------
42-
43-
In summary, we put together all options you have to ensure correct result:
44-
45-
**Option 1: using row-major matrix layout, and `mul(m, v)` math convention**
46-
47-
- Make sure the host code fills in matrices in the odering so that translation terms are specified in `m[3], m[7], m[11]` elements.
48-
- Leave `defaultMatrixLayoutMode` as default value when creating a Slang session, or specify `SLANG_MATRIX_LAYOUT_ROW_MAJOR`.
49-
- Write `mul(Matrix, Vector)` in shader code to transform `Vector` by `Matrix`.
50-
51-
**Option 2: using row-major matrix layout, and `mul(v, m)` math convention**
52-
53-
- Make sure the host code fills in matrices so that translations terms are specified in `m[12], m[13], m[14]` elements. Matrices filled in this way are compatible with typical OpenGL applications.
54-
- Leave `defaultMatrixLayoutMode` as default value when creating a Slang session, or specify `SLANG_MATRIX_LAYOUT_ROW_MAJOR`.
55-
- Write `mul(Vector, Matrix)` in shader code.
56-
57-
**Option 3: using column-major matrix layout, and `mul(m, v)` math convention**
151+
This makes Slang treat all matrices as in `column-major` layout, and for example emitting `column_major` qualifier in resulting HLSL code.
58152

59-
- Make sure the host code fills in matrices in the odering so that translation terms are specified in `m[12], m[13], m[14]` elements. Matrices filled in this way are compatible with typical OpenGL applications.
60-
- Set `defaultMatrixLayoutMode` to `SLANG_MATRIX_LAYOUT_COLUMN_MAJOR` when creating a Slang session.
61-
- Write `mul(Matrix, Vector)` in shader code to transform `Vector` by `Matrix`.
153+
Alternatively the default layout can be set via
62154

63-
**Option 4: using column-major matrix layout, and `mul(v, m)` math convention**
155+
* `setMatrixLayoutMode`/`spSetMatrixLayoutMode` API calls
156+
* `-matrix-layout-row-major` or `-matrix-layout-column-major` command line options
157+
* or via `spProcessCommandLineArguments`/`processCommandLineArguments`
64158

65-
- Make sure the host code fills in matrices so that translations terms are specified in `m[3], m[7], m[11]` elements.
66-
- Set `defaultMatrixLayoutMode` to `SLANG_MATRIX_LAYOUT_COLUMN_MAJOR` when creating a Slang session.
67-
- Write `mul(Vector, Matrix)` in shader code.
68159

69-
And that's all you need to pay attention to. Slang will make sure the remaining details are correctly handled when generating target HLSL/GLSL code.

0 commit comments

Comments
 (0)