@@ -27,7 +27,7 @@ networks. For examples:
27
27
28
28
- PyTorch supports Swish via the SiLU operation [[ #7 ]] [ 7 ] . The operation does
29
29
not support specifying ` factor ` in the formula.
30
- - OpenVINO support Swish via the Swish operation [[ #8 ]] [ 8 ] . Unlike PyTorch's
30
+ - OpenVINO supports Swish via the Swish operation [[ #8 ]] [ 8 ] . Unlike PyTorch's
31
31
SiLU operation, OpenVINO's Swish also accepts a scalar input ` Beta ` as the
32
32
multiplication ` factor ` for Sigmoid.
33
33
- For ONNX, a PR is working in progress to add Swish operation [[ #9 ]] [ 9 ] .
@@ -37,6 +37,9 @@ networks. For examples:
37
37
- cuDNN backend API supports Swish as a mode (` CUDNN_POINTWISE_SWISH_FWD ` ) of
38
38
its Pointwise operation [[ #11 ]] [ 11 ] and accepts attribute
39
39
` CUDNN_ATTR_POINTWISE_SWISH_BETA ` as the multiplication ` factor ` .
40
+ - Please note that, even PyTorch has SiLU operation, there are still many model
41
+ scripts choosing to implement swish with a composition of smaller operations
42
+ [[ #12 ]] [ 12 ] .
40
43
41
44
## Proposals
42
45
@@ -76,6 +79,10 @@ swish.finalize();
76
79
Pros:
77
80
78
81
- There is no need to define and maintain a new operation in oneDNN Graph API.
82
+ - Composition of smaller operations makes it possible and scalable to extend whe
83
+ the activation has more variants or flavors in the future.
84
+ - The approach of composition of Multiply and Sigmoid is also adopted in models
85
+ as mentioned above.
79
86
80
87
Cons:
81
88
@@ -98,7 +105,8 @@ in oneDNN Graph API. The proposed operation schema is as follow:
98
105
99
106
- Operation Kind: `Swish` (C++), `dnnl_graph_op_swish` (C).
100
107
- Input/output: Single input, single output.
101
- - Attribute: `beta` (optional). `beta = 1.f` if not provided.
108
+ - Attribute: `alpha` (optional) for the multiplication factor in the formula.
109
+ `alpha = 1.f` if not provided.
102
110
- Data types: f32, bf16, f16.
103
111
104
112
With the new operation being defined, a Swish operation can be programed as
@@ -113,7 +121,7 @@ logical_tensor src = logical_tensor(ID_SRC, dt, shape);
113
121
logical_tensor dst = logical_tensor(ID_DST, dt, shape);
114
122
115
123
op swi = op(ID_SWI, op::kind::Swish, "swi");
116
- swi.set_attr<float>(op::attr::beta , 0.5f); // optional
124
+ swi.set_attr<float>(op::attr::alpha , 0.5f); // optional
117
125
swi.add_input(src);
118
126
swi.add_output(dst);
119
127
@@ -139,16 +147,14 @@ Cons:
139
147
maintenance effort.
140
148
- To some extend, supporting all Sigmoid, Multiply, and Swish operations is kind
141
149
of duplication.
150
+ - We will need to break the API or add a new operation if the operation formula
151
+ changes (eg. the ` factor ` is extended from a scalar to a vector or full
152
+ tensor) in the future. But with option 1, we just need to define a new pattern
153
+ without bloating the API.
142
154
143
155
## Conclusions
144
156
145
- Option 2 is recommended.
146
-
147
- oneDNN eltwise primitive and post-op can be used as the implementation of this
148
- operation and fusions.
149
-
150
- Benchdnn graph driver needs to be extended to support the validation of this new
151
- operation with reusing eltwise driver as the reference.
157
+ ( TBD.)
152
158
153
159
## References
154
160
@@ -163,6 +169,7 @@ operation with reusing eltwise driver as the reference.
163
169
9 . PR for Swish operation in ONNX, https://github.com/onnx/onnx/pull/5964
164
170
10 . Swish in oneDNN, https://oneapi-src.github.io/oneDNN/dev_guide_eltwise.html
165
171
11 . Swish in cuDNN, https://docs.nvidia.com/deeplearning/cudnn/latest/api/cudnn-graph-library.html#cudnnpointwisemode-t
172
+ 12 . Swish implementation in Huggingface repository, https://github.com/search?q=org%3Ahuggingface%20swish&type=code
166
173
167
174
[ 1 ] : https://arxiv.org/abs/1710.05941v1
168
175
[ 2 ] : https://arxiv.org/abs/1606.08415
@@ -175,3 +182,4 @@ operation with reusing eltwise driver as the reference.
175
182
[ 9 ] : https://github.com/onnx/onnx/pull/5964
176
183
[ 10 ] : https://oneapi-src.github.io/oneDNN/dev_guide_eltwise.html
177
184
[ 11 ] : https://docs.nvidia.com/deeplearning/cudnn/latest/api/cudnn-graph-library.html#cudnnpointwisemode-t
185
+ [ 12 ] : https://github.com/search?q=org%3Ahuggingface%20swish&type=code
0 commit comments