You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[qtype](@ref dnnl::graph::op::attr::qtype) | Specifies which de-quantization type is used. | string |`per_tensor` (default), `per_channel`| Optional |
24
-
|[axis](@ref dnnl::graph::op::attr::axis) | Specifies dimension on which per-channel de-quantization is applied. | s64 | A s64 value in the range of [-r, r-1] where r = rank(src), `1` by default. Negative value means counting the dimension backwards from the end. | Optional |
34
+
|[axis](@ref dnnl::graph::op::attr::axis) | Specifies dimension on which per-channel de-quantization is applied. | s64 | An s64 value in the range of [-r, r-1] where r = rank(src), `1` by default. Negative values mean counting the dimension backwards from the end. | Optional |
35
+
|[group_shape](@ref dnnl::graph::op::attr::group_shape) | Specifies the group shape of an operation. | s64 | An s64 list indicates the group size on the dimensions where grouped quantization is adopted. | Optional |
25
36
26
37
## Execution arguments
27
38
@@ -36,15 +47,23 @@ constructing an operation.
36
47
| 1 |`scales`| Required |
37
48
| 2 |`zps`| Optional |
38
49
39
-
@note`scales` is a f32 1D tensor to be applied to the de-quantization formula.
40
-
For `qtype` = `per-tensor`, there should be only one element in the scales
41
-
tensor. For `qtype` = `per-channel`, the element number should be equal to the
42
-
element number of src tensor along the dimension axis.
43
-
44
-
@note`zps` is a 1D tensor with offset values that map to zero. For `qtype` =
45
-
`per-tensor`, there should be only one element in the zps tensor. For `qtype` =
50
+
@note`scales` is a bf16/f16/f32 tensor to be applied to the de-quantization
51
+
formula. For `qtype` = `per-tensor`, there should be only one element in the
52
+
`scales` tensor. For `qtype` = `per-channel`, the element number should be equal
53
+
to the element number of the src tensor along the dimension axis. For
54
+
`qtype` = `per-gropup`, the `scale` tensor should have the same number of
55
+
dimension as the `src` tensor. On the dimensions where grouped quantization is
56
+
applied, the dimension should be the number of groups, which equals to
57
+
`src_dim` / `group_size`, while other dimensions should match the `src` tensor.
58
+
59
+
@note`zps` is a tensor with offset values that map to zero. For `qtype` =
60
+
`per-tensor`, there should be only one element in the `zps` tensor. For `qtype` =
46
61
`per-channel`, the element number should be equal to the element number of input
47
-
tensor along the dimension axis. If omitted, zps values are assumed to be zero.
62
+
tensor along the dimension axis. For `qtype` = `per-group`, the `zps` tensor
63
+
should have the same number of dimensions as the `src` tensor. On the dimensions
64
+
where grouped quantization is applied, the dimension should be the number of
65
+
groups, which equals to `src_dim` / `group_size`, while other dimensions should
66
+
match the `src` tensor. If omitted, the `zps` values are assumed to be zero.
48
67
49
68
### Outputs
50
69
@@ -58,5 +77,9 @@ DynamicDequantize operation supports the following data type combinations.
0 commit comments