Skip to content

Commit b7de6c5

Browse files
committed
update
1 parent 5bb3253 commit b7de6c5

File tree

1 file changed

+18
-10
lines changed

1 file changed

+18
-10
lines changed

rfcs/20241008-graph-api-swish/README.md

+18-10
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ networks. For examples:
2727

2828
- PyTorch supports Swish via the SiLU operation [[#7]][7]. The operation does
2929
not support specifying `factor` in the formula.
30-
- OpenVINO support Swish via the Swish operation [[#8]][8]. Unlike PyTorch's
30+
- OpenVINO supports Swish via the Swish operation [[#8]][8]. Unlike PyTorch's
3131
SiLU operation, OpenVINO's Swish also accepts a scalar input `Beta` as the
3232
multiplication `factor` for Sigmoid.
3333
- For ONNX, a PR is working in progress to add Swish operation [[#9]][9].
@@ -37,6 +37,9 @@ networks. For examples:
3737
- cuDNN backend API supports Swish as a mode (`CUDNN_POINTWISE_SWISH_FWD`) of
3838
its Pointwise operation [[#11]][11] and accepts attribute
3939
`CUDNN_ATTR_POINTWISE_SWISH_BETA` as the multiplication `factor`.
40+
- Please note that, even PyTorch has SiLU operation, there are still many model
41+
scripts choosing to implement swish with a composition of smaller operations
42+
[[#12]][12].
4043

4144
## Proposals
4245

@@ -76,6 +79,10 @@ swish.finalize();
7679
Pros:
7780
7881
- There is no need to define and maintain a new operation in oneDNN Graph API.
82+
- Composition of smaller operations makes it possible and scalable to extend whe
83+
the activation has more variants or flavors in the future.
84+
- The approach of composition of Multiply and Sigmoid is also adopted in models
85+
as mentioned above.
7986
8087
Cons:
8188
@@ -98,7 +105,8 @@ in oneDNN Graph API. The proposed operation schema is as follow:
98105
99106
- Operation Kind: `Swish` (C++), `dnnl_graph_op_swish` (C).
100107
- Input/output: Single input, single output.
101-
- Attribute: `beta` (optional). `beta = 1.f` if not provided.
108+
- Attribute: `alpha` (optional) for the multiplication factor in the formula.
109+
`alpha = 1.f` if not provided.
102110
- Data types: f32, bf16, f16.
103111
104112
With the new operation being defined, a Swish operation can be programed as
@@ -113,7 +121,7 @@ logical_tensor src = logical_tensor(ID_SRC, dt, shape);
113121
logical_tensor dst = logical_tensor(ID_DST, dt, shape);
114122
115123
op swi = op(ID_SWI, op::kind::Swish, "swi");
116-
swi.set_attr<float>(op::attr::beta, 0.5f); // optional
124+
swi.set_attr<float>(op::attr::alpha, 0.5f); // optional
117125
swi.add_input(src);
118126
swi.add_output(dst);
119127
@@ -139,16 +147,14 @@ Cons:
139147
maintenance effort.
140148
- To some extend, supporting all Sigmoid, Multiply, and Swish operations is kind
141149
of duplication.
150+
- We will need to break the API or add a new operation if the operation formula
151+
changes (eg. the `factor` is extended from a scalar to a vector or full
152+
tensor) in the future. But with option 1, we just need to define a new pattern
153+
without bloating the API.
142154

143155
## Conclusions
144156

145-
Option 2 is recommended.
146-
147-
oneDNN eltwise primitive and post-op can be used as the implementation of this
148-
operation and fusions.
149-
150-
Benchdnn graph driver needs to be extended to support the validation of this new
151-
operation with reusing eltwise driver as the reference.
157+
(TBD.)
152158

153159
## References
154160

@@ -163,6 +169,7 @@ operation with reusing eltwise driver as the reference.
163169
9. PR for Swish operation in ONNX, https://github.com/onnx/onnx/pull/5964
164170
10. Swish in oneDNN, https://oneapi-src.github.io/oneDNN/dev_guide_eltwise.html
165171
11. Swish in cuDNN, https://docs.nvidia.com/deeplearning/cudnn/latest/api/cudnn-graph-library.html#cudnnpointwisemode-t
172+
12. Swish implementation in Huggingface repository, https://github.com/search?q=org%3Ahuggingface%20swish&type=code
166173

167174
[1]: https://arxiv.org/abs/1710.05941v1
168175
[2]: https://arxiv.org/abs/1606.08415
@@ -175,3 +182,4 @@ operation with reusing eltwise driver as the reference.
175182
[9]: https://github.com/onnx/onnx/pull/5964
176183
[10]: https://oneapi-src.github.io/oneDNN/dev_guide_eltwise.html
177184
[11]: https://docs.nvidia.com/deeplearning/cudnn/latest/api/cudnn-graph-library.html#cudnnpointwisemode-t
185+
[12]: https://github.com/search?q=org%3Ahuggingface%20swish&type=code

0 commit comments

Comments
 (0)