@@ -27,250 +27,13 @@ Both kinds of experimental features can be enabled simultaneously.
27
27
28
28
| Build time option | Description |
29
29
| :-------------------------------------------| :-------------------------------------------------------------------|
30
- | ONEDNN_EXPERIMENTAL_SPARSE | Enable experimental API and functionality for sparse domain. |
31
30
| ONEDNN_EXPERIMENTAL_UKERNEL | Enable experimental microkernel APIs and functionalities. |
32
31
| ONEDNN_EXPERIMENTAL_PROFILING | Enable experimental profiling API. |
33
32
| ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_BACKEND | Enable experimental graph compiler backend of the graph component. |
34
33
| ONEDNN_EXPERIMENTAL_LOGGING | Enable experimental logging support for oneDNN verbose mode. |
35
34
36
35
## Features details
37
36
38
- ### ONEDNN_EXPERIMENTAL_SPARSE
39
- This option extends the existing API and adds a new one to support sparse
40
- functionality in oneDNN.
41
-
42
- #### API
43
-
44
- The main change is in oneDNN memory object semantics. Now, the memory object can
45
- have multiple underlying buffers. In the case of regular dense computations, the
46
- memory object always contains a single buffer. But in the case of sparse
47
- computations, the memory object always contains one buffer for values and an
48
- arbitrary number of additional buffers for metadata.
49
-
50
- The underlying buffers are enumerated starting with 0, meaning that each buffer
51
- has its own number. The buffer with values always has index 0.
52
-
53
- In most cases, the API that works with underlying buffers takes a buffer index. The
54
- exception is the API for creating a memory object. In that case, the API takes a vector
55
- of buffers. The order of the buffers in the vector matters and should correspond to
56
- the buffers' indices.
57
-
58
- oneDNN also introduces a new format kind dnnl::memory::format_kind::sparse.
59
- Sparse encoding (a.k.a. sparse format) is an enumeration type that specifies
60
- how data is encoded. Currently, oneDNN supports Compressed Sparse Row (CSR),
61
- Sorted Co-ordinate (COO) Sparse Format, and PACKED sparse encodings
62
- (dnnl::memory::sparse_encoding::csr, dnnl::memory::sparse_encoding::coo,
63
- dnnl::memory::sparse_encoding::packed) for CPU engine, and, only sorted
64
- COO (Co-ordinate Sparse Format) for GPU engine.
65
-
66
- The memory descriptor has dedicated static member functions for creating memory
67
- descriptors for different sparse encodings.
68
-
69
- Each encoding defines the number and meaning of the buffers.
70
-
71
- | Sparse encoding | Buffers |
72
- | :----------------| :---------------------------------------------------------------------------|
73
- | CSR | 0 - values, 1 - indices, 2 - pointers |
74
- | Sorted COO | 0 - values, 1 to * ndims* - indices (* ndims* - number of tensor dimensions) |
75
- | PACKED | The meaning and content are unspecified |
76
-
77
- The pseudocode below demonstrates how to create a memory object
78
- for the CSR and COO sparse encodings and use the new API to work with the
79
- underlying handles.
80
-
81
- ###### CSR Encoding:
82
- ~~~ cpp
83
- using namespace dnnl ;
84
- const memory::dim M = 4 , N = 6 ;
85
- const memory::dim nnz = 5 ;
86
- const auto values_dt = memory::data_type::f32;
87
- const auto indices_dt = memory::data_type::s32;
88
- const auto pointers_dt = memory::data_type::s32;
89
-
90
- // Create a memory descriptor for CSR sparse encoding.
91
- const auto csr_md = memory::desc::csr(
92
- {M, N}, // Dimensions
93
- values_dt, // Data type of values
94
- nnz, // Number of non-zero entries
95
- indices_dt, // Data type of indices (metadata)
96
- pointers_dt); // Data type of pointers (metadata)
97
-
98
- // A sparse matrix represented in the CSR format.
99
- std::vector<float> csr_values = {2.5f, 1.5f, 1.5f, 2.5f, 2.0f};
100
- std::vector<int32_t> csr_indices = {0, 2, 0, 5, 1};
101
- std::vector<int32_t> csr_pointers = {0, 1, 2, 4, 5, 5};
102
-
103
- // Create a memory object for the given buffers with values and metadata.
104
- memory csr_mem(csr_md, engine, {
105
- csr_values.data(), // Buffer with values
106
- csr_indices.data(), // Buffer with indices (metadata)
107
- csr_pointers.data() // Buffer with pointers (metadata)
108
- });
109
-
110
- const auto values_sz = csr_mem.get_size(0);
111
- const auto indices_sz = csr_mem.get_size(1);
112
- const auto pointers_sz = csr_mem.get_size(2);
113
-
114
- assert (values_sz == csr_values.size() * sizeof(float));
115
- assert(indices_sz == csr_indices.size() * sizeof(int32_t));
116
- assert(pointers_sz == csr_pointers.size() * sizeof(int32_t));
117
-
118
- void *values_handle = csr_mem.get_data_handle(0);
119
- void *indices_handle = csr_mem.get_data_handle(1);
120
- void *pointers_handle = csr_mem.get_data_handle(2);
121
-
122
- assert (values_handle == (void * )csr_values.data());
123
- assert(indices_handle == (void * )csr_indices.data());
124
- assert(pointers_handle == (void * )csr_pointers.data());
125
- ~~~
126
-
127
- ###### Sorted COO Encoding:
128
- ~~~cpp
129
- using namespace dnnl;
130
- const memory::dim M = 4, N = 6;
131
- const memory::dim nnz = 5;
132
- const auto values_dt = memory::data_type::f32;
133
- const auto indices_dt = memory::data_type::s32;
134
-
135
- // Create a memory descriptor for COO sparse encoding.
136
- const auto coo_md = memory::desc::coo(
137
- {M, N}, // Dimensions
138
- values_dt, // Data type of values
139
- nnz, // Number of non-zero entries
140
- indices_dt); // Data type of indices (metadata)
141
-
142
- // A sparse matrix represented in the COO format.
143
- std::vector<float> coo_values = {2.5f, 1.5f, 1.5f, 2.5f, 2.0f};
144
- std::vector<int32_t> coo_row_indices = {0, 1, 2, 2, 3};
145
- std::vector<int32_t> coo_col_indices = {0, 2, 0, 5, 1};
146
-
147
- // Create a memory object for the given buffers with values and metadata.
148
- memory coo_mem(coo_md, engine, {
149
- coo_values.data(), // Buffer with values
150
- coo_row_indices.data(), // Buffer with row indices (metadata)
151
- coo_col_indices.data() // Buffer with column indices (metadata)
152
- });
153
-
154
- const auto values_sz = coo_mem.get_size(0);
155
- const auto indices_sz = coo_mem.get_size(1);
156
-
157
- assert(values_sz == coo_values.size() * sizeof(float));
158
- assert(indices_sz == coo_row_indices.size() * sizeof(int32_t));
159
- assert(indices_sz == coo_col_indices.size() * sizeof(int32_t));
160
-
161
- void *values_handle = coo_mem.get_data_handle(0);
162
- void *row_indices_handle = coo_mem.get_data_handle(1);
163
- void *col_indices_handle = coo_mem.get_data_handle(2);
164
-
165
- assert(values_handle == (void *)coo_values.data());
166
- assert(row_indices_handle == (void *)coo_row_indices.data());
167
- assert(col_indices_handle == (void *)coo_col_indices.data());
168
- ~~~
169
-
170
- A memory descriptor created for the sparse encoding PACKED cannot
171
- be used to create a memory object. It can only be used to create
172
- a primitive descriptor to query the actual memory descriptor
173
- (similar to the format tag ` any ` ).
174
-
175
- #### Primitives
176
-
177
- ##### Matrix Multiplication
178
-
179
- This option enables the matmul primitive that can work with
180
- sparse input tensors.
181
-
182
- ###### CSR encoding
183
- Supported only for the CPU engine. Only one of the input tensors can be sparse.
184
- The output tensor is always dense.
185
-
186
- The following data type combinations are supported:
187
-
188
- | Values (src, weight, dst) | Indices |
189
- | :----------------------------| :---------|
190
- | f16, f16, f16 | s32 |
191
- | f32, f32, f32 | s32 |
192
-
193
- The following format tags are supported for dense input/output
194
- tensors:
195
-
196
- * ab
197
-
198
- See the example [ here] (@ref cpu_matmul_csr_cpp).
199
-
200
- Benchdnn can be used to test matmul with a CSR input tensor as follows:
201
- ` ./benchdnn --matmul --encoding=csr+0.99:: --wtag=ab --dtag=ab 4x1000000:1000000x128 `
202
-
203
- For the case above, the number of non-zero elements for the source tensor is
204
- calculated as max(4 * 1000000 * (1 - 0.99), 1).
205
-
206
- ###### COO encoding
207
- Supported only for the CPU and GPU engines. Only one of the input tensors can
208
- be sparse. The output tensor is always dense.
209
-
210
- The following data type combinations are supported:
211
-
212
- | Values (src, weight, dst) | Indices |
213
- | :----------------------------| :---------|
214
- | f16, f16, f16 | s32 |
215
- | f32, f32, f32 | s32 |
216
-
217
- The following format tags are supported for dense weights tensor:
218
-
219
- * ab
220
- * ba
221
-
222
- The following format tags are supported for dense destination tensor:
223
-
224
- * ab
225
-
226
- See the example [ here] (@ref cpu_matmul_coo_cpp).
227
-
228
- Benchdnn can be used to test matmul with a COO input tensor as follows:
229
- ` ./benchdnn --matmul --encoding=coo+0.99:: --wtag=ab --dtag=ab 4x1000000:1000000x128 `
230
-
231
- For the case above, the number of non-zero elements for the source tensor is
232
- calculated as max(4 * 1000000 * (1 - 0.99), 1).
233
-
234
- ###### PACKED encoding
235
-
236
- Only the weights tensor is allowed to be sparse. The other tensors
237
- are always dense.
238
-
239
- In general, it is expected that all matmul related functionality (e.g. post-ops,
240
- scales, zero-points, etc) that is supported for the dense weights should
241
- also work for the sparse weights.
242
-
243
- Currently, matmul has the following limitations for the PACKED encoding:
244
- * Supported only for the CPU engine
245
- * Only Intel Advanced Matrix Extensions (Intel AMX) instruction set
246
- architecture (ISA) is supported
247
- * Only ` s8 ` data type for the weights is supported
248
- * Only 1 batch dimension is supported
249
-
250
- See the example [ here] (@ref cpu_matmul_weights_compression_cpp).
251
-
252
- Benchdnn can be used to test matmul with the PACKED weights tensor as follows:
253
- ` ./benchdnn --matmul --dt=s8:s8:s32 --encoding=:packed+0.99: 3x512x1024:1x1024x512 `
254
-
255
- For the case above, the number of non-zero elements for the weights tensor is
256
- calculated as max(1024 * 512 * (1 - 0.99), 1).
257
-
258
- ##### Reorder
259
-
260
- Currently, there is only one reorder for packing a dense tensor, i.e. converting
261
- a dense tensor that is in ` ab ` format to a sparse tensor that is encoded with
262
- the ` PACKED ` encoding.
263
-
264
- In general, it is expected that all reorder-related functionality
265
- (e.g. scales, zero-points, etc) that is supported for the dense
266
- destination tensor should also work for the sparse one.
267
-
268
- #### Common Limitations
269
- * The interoperability API to get/set data handles is not supported. Use the
270
- runtime agnostic API to do that.
271
- * Sparse memory and memory descriptor can only be used with the Matrix
272
- Multiplication and Reorder primitives.
273
-
274
37
### ONEDNN_EXPERIMENTAL_UKERNEL
275
38
276
39
This option enables a new set of CPU-only APIs to support block-level
0 commit comments