Skip to content

Commit 8d48e46

Browse files
wzt1997TaoLv
authored andcommitted
doc: graph: enhance documentation for dynamic dequantize with grouped quantization
1 parent 55b8e29 commit 8d48e46

File tree

1 file changed

+39
-16
lines changed

1 file changed

+39
-16
lines changed

doc/graph/operations/DynamicDequantize.md

+39-16
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@ DynamicDequantize {#dev_guide_op_dynamicdequantize}
33

44
## General
55

6-
DynamicDequantize operation converts a quantized (s8 or u8) tensor to a f32
7-
tensor. It supports both per-tensor and per-channel asymmetric linear
8-
de-quantization. Rounding mode is library-implementation defined. Unlike the
9-
@ref dev_guide_op_dequantize, DynamicDequantize takes scales and zero-points as
10-
operator src tensors.
6+
The Dynamic Dequantize operation converts a quantized (s4, u4, s8, or u8) tensor
7+
to an bf16, f16 or f32 tensor. It supports per-tensor, per-channel, and per-group asymmetric
8+
linear de-quantization. The rounding mode is defined by the library
9+
implementation. Unlike the @ref dev_guide_op_dequantize, Dynamic Dequantize takes
10+
scales and zero-points as operator src tensors.
1111

1212
For per-tensor de-quantization
1313

@@ -16,12 +16,23 @@ For per-tensor de-quantization
1616
For per-channel de-quantization, taking channel axis = 1 as an example:
1717
\f[ {dst}_{\cdots,i,\cdots,\cdots} = (src_{\cdots,i,\cdots,\cdots} - zps_i)*scales_i,i\in [0,channelNum-1] \f]
1818

19+
For per-group de-quantization, let's take group shape = Gx1 as an example. It
20+
indicates that one scaling factor will de adopted for G elements in the src
21+
tensor. On the dimensions where group quantization is adopted, make channelNum
22+
equal to the dimension of src and groupNum equal to channelNum/group size:
23+
\f[ {dst}_{i,\cdots} = (src_{i,\cdots} - zps_j)*scales_j,i\in [0,channelNum-1],j\in [0,groupNum-1] \f]
24+
Where:
25+
\f[ i = j*groupSize+k,k\in [0,groupSize-1] \f]
26+
On other dimensions:
27+
\f[ {dst}_{i,\cdots} = (src_{i,\cdots} - zps_i)*scales_i,i\in [0,channelNum-1] \f]
28+
1929
## Operation attributes
2030

2131
| Attribute Name | Description | Value Type | Supported Values | Required or Optional |
2232
|:-------------------------------------------|:---------------------------------------------------------------------|:-----------|:------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------|
2333
| [qtype](@ref dnnl::graph::op::attr::qtype) | Specifies which de-quantization type is used. | string | `per_tensor` (default), `per_channel` | Optional |
24-
| [axis](@ref dnnl::graph::op::attr::axis) | Specifies dimension on which per-channel de-quantization is applied. | s64 | A s64 value in the range of [-r, r-1] where r = rank(src), `1` by default. Negative value means counting the dimension backwards from the end. | Optional |
34+
| [axis](@ref dnnl::graph::op::attr::axis) | Specifies dimension on which per-channel de-quantization is applied. | s64 | An s64 value in the range of [-r, r-1] where r = rank(src), `1` by default. Negative values mean counting the dimension backwards from the end. | Optional |
35+
| [group_shape](@ref dnnl::graph::op::attr::group_shape) | Specifies the group shape of an operation. | s64 | An s64 list indicates the group size on the dimensions where grouped quantization is adopted. | Optional |
2536

2637
## Execution arguments
2738

@@ -36,15 +47,23 @@ constructing an operation.
3647
| 1 | `scales` | Required |
3748
| 2 | `zps` | Optional |
3849

39-
@note `scales` is a f32 1D tensor to be applied to the de-quantization formula.
40-
For `qtype` = `per-tensor`, there should be only one element in the scales
41-
tensor. For `qtype` = `per-channel`, the element number should be equal to the
42-
element number of src tensor along the dimension axis.
43-
44-
@note `zps` is a 1D tensor with offset values that map to zero. For `qtype` =
45-
`per-tensor`, there should be only one element in the zps tensor. For `qtype` =
50+
@note `scales` is a bf16/f16/f32 tensor to be applied to the de-quantization
51+
formula. For `qtype` = `per-tensor`, there should be only one element in the
52+
`scales` tensor. For `qtype` = `per-channel`, the element number should be equal
53+
to the element number of the src tensor along the dimension axis. For
54+
`qtype` = `per-gropup`, the `scale` tensor should have the same number of
55+
dimension as the `src` tensor. On the dimensions where grouped quantization is
56+
applied, the dimension should be the number of groups, which equals to
57+
`src_dim` / `group_size`, while other dimensions should match the `src` tensor.
58+
59+
@note `zps` is a tensor with offset values that map to zero. For `qtype` =
60+
`per-tensor`, there should be only one element in the `zps` tensor. For `qtype` =
4661
`per-channel`, the element number should be equal to the element number of input
47-
tensor along the dimension axis. If omitted, zps values are assumed to be zero.
62+
tensor along the dimension axis. For `qtype` = `per-group`, the `zps` tensor
63+
should have the same number of dimensions as the `src` tensor. On the dimensions
64+
where grouped quantization is applied, the dimension should be the number of
65+
groups, which equals to `src_dim` / `group_size`, while other dimensions should
66+
match the `src` tensor. If omitted, the `zps` values are assumed to be zero.
4867

4968
### Outputs
5069

@@ -58,5 +77,9 @@ DynamicDequantize operation supports the following data type combinations.
5877

5978
| Src | Dst | Scales | Zps |
6079
|:-- -|:----|:-------|:------------|
61-
| s8 | f32 | f32 | s8, u8, s32 |
62-
| u8 | f32 | f32 | s8, u8, s32 |
80+
| s8 | f16, bf16, f32 | f16, bf16, f32 | s8, u8, s32 |
81+
| u8 | f16, bf16, f32 | f16, bf16, f32 | s8, u8, s32 |
82+
| s4 | f16, bf16, f32 | f16, bf16, f32 | s4, u4, s32 |
83+
| u4 | f16, bf16, f32 | f16, bf16, f32 | s4, u4, s32 |
84+
85+
It's expected that the data types of scales and dst should be the same.

0 commit comments

Comments
 (0)