Skip to content

Commit ed97330

Browse files
committed
common: promote sparse functionality
1 parent 26853d0 commit ed97330

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+296
-864
lines changed

.github/automation/x64/build_linters.sh

+1-2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ if [[ "$ONEDNN_ACTION" == "configure" ]]; then
1313
-DCMAKE_BUILD_TYPE=debug \
1414
-DONEDNN_BUILD_GRAPH=ON \
1515
-DDNNL_EXPERIMENTAL=ON \
16-
-DDNNL_EXPERIMENTAL_SPARSE=ON \
1716
-DDNNL_EXPERIMENTAL_PROFILING=ON \
1817
-DDNNL_EXPERIMENTAL_UKERNEL=ON \
1918
-DONEDNN_EXPERIMENTAL_LOGGING=ON \
@@ -25,7 +24,7 @@ if [[ "$ONEDNN_ACTION" == "configure" ]]; then
2524
set +x
2625
elif [[ "$GITHUB_JOB" == "pr-format-tags" ]]; then
2726
set -x
28-
cmake -B../build -S. -DONEDNN_BUILD_GRAPH=OFF -DDNNL_EXPERIMENTAL_SPARSE=ON
27+
cmake -B../build -S. -DONEDNN_BUILD_GRAPH=OFF
2928
set +x
3029
else
3130
echo "Unknown linter job: $GITHUB_JOB"

cmake/dnnl_compat.cmake

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#===============================================================================
2-
# Copyright 2021-2024 Intel Corporation
2+
# Copyright 2021-2025 Intel Corporation
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");
55
# you may not use this file except in compliance with the License.
@@ -34,7 +34,6 @@ endmacro()
3434

3535
set(COMPAT_CACHE_BOOL_VARS
3636
"EXPERIMENTAL"
37-
"EXPERIMENTAL_SPARSE"
3837
"EXPERIMENTAL_UKERNEL"
3938
"EXPERIMENTAL_LOGGING"
4039
"VERBOSE"

cmake/options.cmake

-5
Original file line numberDiff line numberDiff line change
@@ -203,11 +203,6 @@ option(DNNL_EXPERIMENTAL
203203
using environment variables."
204204
OFF) # disabled by default
205205

206-
option(DNNL_EXPERIMENTAL_SPARSE
207-
"Enable experimental functionality for sparse domain. This option works
208-
independently from DNNL_EXPERIMENTAL."
209-
OFF) # disabled by default
210-
211206
option(DNNL_EXPERIMENTAL_UKERNEL
212207
"Enable experimental functionality for ukernels. This option works
213208
independently from DNNL_EXPERIMENTAL."

doc/Doxyfile.in

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#===============================================================================
2-
# Copyright 2016-2022 Intel Corporation
2+
# Copyright 2016-2025 Intel Corporation
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");
55
# you may not use this file except in compliance with the License.
@@ -1962,7 +1962,7 @@ INCLUDE_FILE_PATTERNS =
19621962
# recursively expanded use the := operator instead of the = operator.
19631963
# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
19641964

1965-
PREDEFINED = DOXYGEN_SHOULD_SKIP_THIS DNNL_GPU_RUNTIME=DNNL_RUNTIME_OCL DNNL_WITH_SYCL DNNL_USE_SYCL_BUFFERS DNNL_EXPERIMENTAL_SPARSE DNNL_EXPERIMENTAL_UKERNEL DNNL_EXPERIMENTAL_LOGGING
1965+
PREDEFINED = DOXYGEN_SHOULD_SKIP_THIS DNNL_GPU_RUNTIME=DNNL_RUNTIME_OCL DNNL_WITH_SYCL DNNL_USE_SYCL_BUFFERS DNNL_EXPERIMENTAL_UKERNEL DNNL_EXPERIMENTAL_LOGGING
19661966

19671967
# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then this
19681968
# tag can be used to specify a list of macro names that should be expanded. The

doc/advanced/experimental.md

-237
Original file line numberDiff line numberDiff line change
@@ -27,250 +27,13 @@ Both kinds of experimental features can be enabled simultaneously.
2727

2828
| Build time option | Description |
2929
|:-------------------------------------------|:-------------------------------------------------------------------|
30-
| ONEDNN_EXPERIMENTAL_SPARSE | Enable experimental API and functionality for sparse domain. |
3130
| ONEDNN_EXPERIMENTAL_UKERNEL | Enable experimental microkernel APIs and functionalities. |
3231
| ONEDNN_EXPERIMENTAL_PROFILING | Enable experimental profiling API. |
3332
| ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_BACKEND | Enable experimental graph compiler backend of the graph component. |
3433
| ONEDNN_EXPERIMENTAL_LOGGING | Enable experimental logging support for oneDNN verbose mode. |
3534

3635
## Features details
3736

38-
### ONEDNN_EXPERIMENTAL_SPARSE
39-
This option extends the existing API and adds a new one to support sparse
40-
functionality in oneDNN.
41-
42-
#### API
43-
44-
The main change is in oneDNN memory object semantics. Now, the memory object can
45-
have multiple underlying buffers. In the case of regular dense computations, the
46-
memory object always contains a single buffer. But in the case of sparse
47-
computations, the memory object always contains one buffer for values and an
48-
arbitrary number of additional buffers for metadata.
49-
50-
The underlying buffers are enumerated starting with 0, meaning that each buffer
51-
has its own number. The buffer with values always has index 0.
52-
53-
In most cases, the API that works with underlying buffers takes a buffer index. The
54-
exception is the API for creating a memory object. In that case, the API takes a vector
55-
of buffers. The order of the buffers in the vector matters and should correspond to
56-
the buffers' indices.
57-
58-
oneDNN also introduces a new format kind dnnl::memory::format_kind::sparse.
59-
Sparse encoding (a.k.a. sparse format) is an enumeration type that specifies
60-
how data is encoded. Currently, oneDNN supports Compressed Sparse Row (CSR),
61-
Sorted Co-ordinate (COO) Sparse Format, and PACKED sparse encodings
62-
(dnnl::memory::sparse_encoding::csr, dnnl::memory::sparse_encoding::coo,
63-
dnnl::memory::sparse_encoding::packed) for CPU engine, and, only sorted
64-
COO (Co-ordinate Sparse Format) for GPU engine.
65-
66-
The memory descriptor has dedicated static member functions for creating memory
67-
descriptors for different sparse encodings.
68-
69-
Each encoding defines the number and meaning of the buffers.
70-
71-
| Sparse encoding | Buffers |
72-
|:----------------|:---------------------------------------------------------------------------|
73-
| CSR | 0 - values, 1 - indices, 2 - pointers |
74-
| Sorted COO | 0 - values, 1 to *ndims* - indices (*ndims* - number of tensor dimensions) |
75-
| PACKED | The meaning and content are unspecified |
76-
77-
The pseudocode below demonstrates how to create a memory object
78-
for the CSR and COO sparse encodings and use the new API to work with the
79-
underlying handles.
80-
81-
###### CSR Encoding:
82-
~~~cpp
83-
using namespace dnnl;
84-
const memory::dim M = 4, N = 6;
85-
const memory::dim nnz = 5;
86-
const auto values_dt = memory::data_type::f32;
87-
const auto indices_dt = memory::data_type::s32;
88-
const auto pointers_dt = memory::data_type::s32;
89-
90-
// Create a memory descriptor for CSR sparse encoding.
91-
const auto csr_md = memory::desc::csr(
92-
{M, N}, // Dimensions
93-
values_dt, // Data type of values
94-
nnz, // Number of non-zero entries
95-
indices_dt, // Data type of indices (metadata)
96-
pointers_dt); // Data type of pointers (metadata)
97-
98-
// A sparse matrix represented in the CSR format.
99-
std::vector<float> csr_values = {2.5f, 1.5f, 1.5f, 2.5f, 2.0f};
100-
std::vector<int32_t> csr_indices = {0, 2, 0, 5, 1};
101-
std::vector<int32_t> csr_pointers = {0, 1, 2, 4, 5, 5};
102-
103-
// Create a memory object for the given buffers with values and metadata.
104-
memory csr_mem(csr_md, engine, {
105-
csr_values.data(), // Buffer with values
106-
csr_indices.data(), // Buffer with indices (metadata)
107-
csr_pointers.data() // Buffer with pointers (metadata)
108-
});
109-
110-
const auto values_sz = csr_mem.get_size(0);
111-
const auto indices_sz = csr_mem.get_size(1);
112-
const auto pointers_sz = csr_mem.get_size(2);
113-
114-
assert(values_sz == csr_values.size() * sizeof(float));
115-
assert(indices_sz == csr_indices.size() * sizeof(int32_t));
116-
assert(pointers_sz == csr_pointers.size() * sizeof(int32_t));
117-
118-
void *values_handle = csr_mem.get_data_handle(0);
119-
void *indices_handle = csr_mem.get_data_handle(1);
120-
void *pointers_handle = csr_mem.get_data_handle(2);
121-
122-
assert(values_handle == (void *)csr_values.data());
123-
assert(indices_handle == (void *)csr_indices.data());
124-
assert(pointers_handle == (void *)csr_pointers.data());
125-
~~~
126-
127-
###### Sorted COO Encoding:
128-
~~~cpp
129-
using namespace dnnl;
130-
const memory::dim M = 4, N = 6;
131-
const memory::dim nnz = 5;
132-
const auto values_dt = memory::data_type::f32;
133-
const auto indices_dt = memory::data_type::s32;
134-
135-
// Create a memory descriptor for COO sparse encoding.
136-
const auto coo_md = memory::desc::coo(
137-
{M, N}, // Dimensions
138-
values_dt, // Data type of values
139-
nnz, // Number of non-zero entries
140-
indices_dt); // Data type of indices (metadata)
141-
142-
// A sparse matrix represented in the COO format.
143-
std::vector<float> coo_values = {2.5f, 1.5f, 1.5f, 2.5f, 2.0f};
144-
std::vector<int32_t> coo_row_indices = {0, 1, 2, 2, 3};
145-
std::vector<int32_t> coo_col_indices = {0, 2, 0, 5, 1};
146-
147-
// Create a memory object for the given buffers with values and metadata.
148-
memory coo_mem(coo_md, engine, {
149-
coo_values.data(), // Buffer with values
150-
coo_row_indices.data(), // Buffer with row indices (metadata)
151-
coo_col_indices.data() // Buffer with column indices (metadata)
152-
});
153-
154-
const auto values_sz = coo_mem.get_size(0);
155-
const auto indices_sz = coo_mem.get_size(1);
156-
157-
assert(values_sz == coo_values.size() * sizeof(float));
158-
assert(indices_sz == coo_row_indices.size() * sizeof(int32_t));
159-
assert(indices_sz == coo_col_indices.size() * sizeof(int32_t));
160-
161-
void *values_handle = coo_mem.get_data_handle(0);
162-
void *row_indices_handle = coo_mem.get_data_handle(1);
163-
void *col_indices_handle = coo_mem.get_data_handle(2);
164-
165-
assert(values_handle == (void *)coo_values.data());
166-
assert(row_indices_handle == (void *)coo_row_indices.data());
167-
assert(col_indices_handle == (void *)coo_col_indices.data());
168-
~~~
169-
170-
A memory descriptor created for the sparse encoding PACKED cannot
171-
be used to create a memory object. It can only be used to create
172-
a primitive descriptor to query the actual memory descriptor
173-
(similar to the format tag `any`).
174-
175-
#### Primitives
176-
177-
##### Matrix Multiplication
178-
179-
This option enables the matmul primitive that can work with
180-
sparse input tensors.
181-
182-
###### CSR encoding
183-
Supported only for the CPU engine. Only one of the input tensors can be sparse.
184-
The output tensor is always dense.
185-
186-
The following data type combinations are supported:
187-
188-
| Values (src, weight, dst) | Indices |
189-
|:----------------------------|:---------|
190-
| f16, f16, f16 | s32 |
191-
| f32, f32, f32 | s32 |
192-
193-
The following format tags are supported for dense input/output
194-
tensors:
195-
196-
* ab
197-
198-
See the example [here](@ref cpu_matmul_csr_cpp).
199-
200-
Benchdnn can be used to test matmul with a CSR input tensor as follows:
201-
`./benchdnn --matmul --encoding=csr+0.99:: --wtag=ab --dtag=ab 4x1000000:1000000x128`
202-
203-
For the case above, the number of non-zero elements for the source tensor is
204-
calculated as max(4 * 1000000 * (1 - 0.99), 1).
205-
206-
###### COO encoding
207-
Supported only for the CPU and GPU engines. Only one of the input tensors can
208-
be sparse. The output tensor is always dense.
209-
210-
The following data type combinations are supported:
211-
212-
| Values (src, weight, dst) | Indices |
213-
|:----------------------------|:---------|
214-
| f16, f16, f16 | s32 |
215-
| f32, f32, f32 | s32 |
216-
217-
The following format tags are supported for dense weights tensor:
218-
219-
* ab
220-
* ba
221-
222-
The following format tags are supported for dense destination tensor:
223-
224-
* ab
225-
226-
See the example [here](@ref cpu_matmul_coo_cpp).
227-
228-
Benchdnn can be used to test matmul with a COO input tensor as follows:
229-
`./benchdnn --matmul --encoding=coo+0.99:: --wtag=ab --dtag=ab 4x1000000:1000000x128`
230-
231-
For the case above, the number of non-zero elements for the source tensor is
232-
calculated as max(4 * 1000000 * (1 - 0.99), 1).
233-
234-
###### PACKED encoding
235-
236-
Only the weights tensor is allowed to be sparse. The other tensors
237-
are always dense.
238-
239-
In general, it is expected that all matmul related functionality (e.g. post-ops,
240-
scales, zero-points, etc) that is supported for the dense weights should
241-
also work for the sparse weights.
242-
243-
Currently, matmul has the following limitations for the PACKED encoding:
244-
* Supported only for the CPU engine
245-
* Only Intel Advanced Matrix Extensions (Intel AMX) instruction set
246-
architecture (ISA) is supported
247-
* Only `s8` data type for the weights is supported
248-
* Only 1 batch dimension is supported
249-
250-
See the example [here](@ref cpu_matmul_weights_compression_cpp).
251-
252-
Benchdnn can be used to test matmul with the PACKED weights tensor as follows:
253-
`./benchdnn --matmul --dt=s8:s8:s32 --encoding=:packed+0.99: 3x512x1024:1x1024x512`
254-
255-
For the case above, the number of non-zero elements for the weights tensor is
256-
calculated as max(1024 * 512 * (1 - 0.99), 1).
257-
258-
##### Reorder
259-
260-
Currently, there is only one reorder for packing a dense tensor, i.e. converting
261-
a dense tensor that is in `ab` format to a sparse tensor that is encoded with
262-
the `PACKED` encoding.
263-
264-
In general, it is expected that all reorder-related functionality
265-
(e.g. scales, zero-points, etc) that is supported for the dense
266-
destination tensor should also work for the sparse one.
267-
268-
#### Common Limitations
269-
* The interoperability API to get/set data handles is not supported. Use the
270-
runtime agnostic API to do that.
271-
* Sparse memory and memory descriptor can only be used with the Matrix
272-
Multiplication and Reorder primitives.
273-
27437
### ONEDNN_EXPERIMENTAL_UKERNEL
27538

27639
This option enables a new set of CPU-only APIs to support block-level

0 commit comments

Comments
 (0)