@@ -4,13 +4,13 @@ Fusion Patterns {#dev_guide_graph_fusion_patterns}
4
4
## Overview
5
5
6
6
The following fusion patterns are subgraphs that the oneDNN Graph API recognizes
7
- as candidate for fusion. The patterns are described using oneDNN Graph
7
+ as candidates for fusion. The patterns are described using oneDNN Graph
8
8
operation (op) names with the following convention.
9
9
10
10
@note oneDNN Graph performs limited input validation to minimize the performance
11
11
overheads. The application is responsible for sanitizing inputs passed to the
12
- library. For large u8 or s8 inputs may lead to accumulator overflow, you can use
13
- floating point patterns instead of quantized patterns.
12
+ library. Because large ` u8 ` or ` s8 ` inputs may lead to accumulator overflow, you
13
+ can use floating- point patterns instead of quantized patterns.
14
14
15
15
` "+" ` describes a chain of two ops. The preceding op produces an output tensor,
16
16
which is consumed by the following op as its first operand.
@@ -35,7 +35,7 @@ the producer and consumer relation within one graph partition. For example,
35
35
A\f$_ {>t1}\f$+B+C\f$_ {<t1}\f$ refers
36
36
to the pattern started with A followed by B and C, and C takes an implicit input
37
37
tensor from B and an extra tensor t1 output from A. ` ">" ` refers to the output
38
- tensor, and ` "<" ` for input tensor. Input and output tensor between neighbor
38
+ tensor, and ` "<" ` for input tensor. Input and output tensors between neighbor
39
39
ops are not explicitly marked, for example, B consumes t1 implicitly in the
40
40
example above.
41
41
@@ -55,7 +55,7 @@ inputs. In the example A\f$_{>t1}\f$+B+C\f$_{<t1}\f$, A's inputs are
55
55
regarded as implicit graph partition inputs, and if B is a binary operation, the
56
56
second input tensor is an implicit graph partition input.
57
57
58
- The following categories will be used in describing fusion pattern.
58
+ The following categories will be used in describing a fusion pattern.
59
59
60
60
Unary = [ Abs | Clamp | Elu | Exp | GELU | HardSwish | LeakyReLU |
61
61
Log | Sigmoid | SoftPlus | Pow | ReLU | Round | Sqrt | Square | Tanh]
@@ -105,56 +105,3 @@ ReduceProd | ReduceSum]
105
105
| :--------| :-----------------------------|
106
106
| ConvolutionBackwardWeights + BiasAddBackward\f$_ {>out}\f$ | N/A |
107
107
| ReLUBackward + BatchNormTrainingBackward\f$_ {>out}\f$ | N/A |
108
-
109
- All the above fusion patterns are supported by default.
110
-
111
- ## Aggressive Fusion Patterns
112
- Aggressive fusion patterns also follow the pattern description convention
113
- defined in the [ Fusion Patterns] (@ref fusion_patterns) section.
114
-
115
- @note Aggressive fusion patterns are only supported when
116
- [ Graph Compiler] (@ref dev_guide_graph_compiler) is enabled.
117
-
118
- The following categories will also be used to describe aggressive fusion
119
- patterns.
120
-
121
- - ReshapeTranspose = [ StaticReshape + StaticTranspose\f$^{1-2}\f$]
122
-
123
- - Activation = [ ReLU \| Sigmoid \| GELU]
124
-
125
- - ActivationBackward = [ ReLUBackward \| SigmoidBackward \| GELUBackward]
126
-
127
- ### Inference
128
-
129
- #### Floating Point Patterns
130
-
131
- | Pattern | Description |
132
- | :--------| :-----------------------------|
133
- | MatMul + [ Multiply \| Divide] + Add + Softmax + MatMul + StaticTranspose + Reorder\f$_ {>out}\f$ | Multi-head Attention. This pattern is widely used in models containing encoder-decoder structures, for example BERT. |
134
- | ReshapeTranspose\f$_ {>t1}\f$, ReshapeTranspose\f$_ {>t2}\f$, ReshapeTranspose + MatMul\f$_ {<t1}\f$ + [ Multiply \| Divide] + Add + Softmax + MatMul\f$_ {<t2}\f$ + StaticTranspose + StaticReshape\f$_ {>out}\f$ | Multi-head Attention. |
135
- | MatMul + Activation\f$_ {>t1}\f$, [ MatMul\f$_ {<t1}\f$ + Activation\f$_ {>t1}\f$] \f$^{0-4}\f$, MatMul\f$_ {<t1}\f$ + Activation\f$_ {>out}\f$ | Multi-layer Perceptron. This pattern is widely used in recommendation models, for example DLRM. |
136
- | [ Convolution + BiasAdd\f$^{?}\f$ + ReLU] \f$^{1-3}\f$ + Convolution + BiasAdd\f$^{?}\f$ + Add + ReLU\f$_ {>out}\f$ | Identical Bottleneck. Enabled only in single thread runtime scenario. This pattern is widely used in Convolution Neural Networks, for example ResNet. |
137
- | Convolution + BiasAdd\f$^{?}\f$\f$_ {>t1}\f$, [ Convolution + BiasAdd\f$^{?}\f$ + ReLU] \f$^{1-3}\f$ + Convolution + BiasAdd\f$^{?}\f$ + Add\f$_ {<t1}\f$ + ReLU\f$_ {>out}\f$ | Convolutional Bottleneck. Enabled only in single thread runtime scenario. This pattern is widely used in Convolution Neural Networks, for example ResNet. |
138
-
139
- #### Quantized Patterns
140
-
141
- | Pattern | Description |
142
- | :--------| :-----------------------------|
143
- | Dequantize\f$_ {>t1}\f$, Dequantize\f$_ {>t2}\f$, Dequantize + MatMul\f$_ {<t1}\f$ + [ Multiply \| Divide] + Add + Softmax + Quantize + Dequantize + MatMul\f$_ {<t2}\f$ + StaticTranspose + Reorder + Quantize\f$_ {>out}\f$ | Quantized Multi-head Attention. |
144
- | Dequantize + ReshapeTranspose\f$_ {>t1}\f$, Dequantize + ReshapeTranspose\f$_ {>t2}\f$, Dequantize + MatMul\f$_ {<t1}\f$ + [ Multiply \| Divide] + Add + Softmax + Quantize + Dequantize + MatMul\f$_ {<t2}\f$ + StaticTranspose + StaticReshape + Quantize\f$_ {>out}\f$ | Quantized Multi-head Attention. |
145
- | Dequantize\f$_ {>t1}\f$, Dequantize + MatMul\f$_ {<t1}\f$ + Activation + Quantize\f$_ {>t2}\f$, [ Dequantize\f$_ {>t3}\f$, Dequantize\f$_ {<t2}\f$ + MatMul\f$_ {<t3}\f$ + Activation + Quantize\f$_ {>t2}\f$] \f$^{0-4}\f$, Dequantize\f$_ {>t4}\f$, Dequantize\f$_ {<t2}\f$ + MatMul\f$_ {<t4}\f$ + Activation + Quantize\f$_ {>out}\f$ | Quantized Multi-layer Perceptron. |
146
- | Dequantize\f$_ {>t2}\f$, Dequantize\f$_ {>t3}\f$, [ Dequantize\f$_ {>t1}\f$, Dequantize + Convolution\f$_ {<t1}\f$ + BiasAdd\f$^{?}\f$ + ReLU + Quantize] \f$^{1-3}\f$ + Dequantize + Convolution\f$_ {<t2}\f$ + BiasAdd\f$^{?}\f$ + Add\f$_ {<t3}\f$ + ReLU + Quantize\f$_ {>out}\f$ | Quantized Identical Bottleneck. Enabled only in single thread runtime scenario. |
147
- | [ Dequantize\f$_ {>t1}\f$, Dequantize + Convolution\f$_ {<t1}\f$ + BiasAdd\f$^{?}\f$ + Quantize + Dequantize] \f$_ {>t2}\f$, Dequantize\f$_ {>t4}\f$, [ Dequantize\f$_ {>t3}\f$, Dequantize + Convolution\f$_ {<t3}\f$ + BiasAdd\f$^{?}\f$ + ReLU + Quantize] \f$^{1-3}\f$ + Dequantize + Convolution\f$_ {<t4}\f$ + BiasAdd\f$^{?}\f$ + Add\f$_ {<t2}\f$ + ReLU + Quantize\f$_ {>out}\f$ | Quantized Convolutional Bottleneck. Enabled only in single thread runtime scenario. |
148
-
149
- ### Training
150
-
151
- | Pattern | Description |
152
- | :--------| :-----------------------------|
153
- | Dequantize\f$_ {>t1}\f$, Dequantize\f$_ {>t2}\f$, Dequantize + MatMul\f$_ {<t1}\f$ + [ Multiply \| Divide] + Add + Softmax + Quantize + Dequantize + MatMul\f$_ {<t2}\f$ + StaticTranspose + Reorder + Quantize\f$_ {>out}\f$ | Multi-head Attention Training Forward Pattern. |
154
- | StaticReshape + StaticTranspose\f$_ {>t1}\f$ + MatMul + Multiply\f$_ {>t2}\f$ + Subtract\f$_ {<t3}\f$ + Multiply\f$^{?}\f$ + [ Multiply \| Divide] \f$_ {>t4}\f$ + MatMul\f$_ {>out1}\f$, Multiply\f$_ {<t2}\f$ + ReduceSum\f$_ {>t3}\f$, MatMul\f$_ {<t1,>out2}\f$, MatMul\f$_ {<t4,>out3}\f$ | Multi-head Attention Training Backward Pattern. |
155
- | MatMul\f$_ {>out1}\f$ + Activation\f$_ {>t1,>out2}\f$, [ MatMul\f$_ {<t1,>out3}\f$ + Activation\f$_ {>t1,>out4}\f$] \f$^{0-4}\f$, MatMul\f$_ {<t1,>out5}\f$ + Activation\f$_ {>out6}\f$ | Multi-layer Perceptron Training Forward Pattern. |
156
- | StaticTranspose\f$^{?}\f$\f$_ {>t0}\f$, ActivationBackward\f$_ {>t2}\f$ + MatMul\f$_ {<t0,>t1}\f$, ReduceSum\f$^{?}\f$\f$_ {<t2,>out1}\f$, StaticTranspose\f$^{?}\f$ + MatMul\f$_ {<t2,>out2}\f$, [ StaticTranspose\f$^{?}\f$\f$_ {>t3}\f$, ActivationBackward\f$_ {>t4,<t1}\f$ + MatMul\f$_ {<t3,>t1}\f$, ReduceSum\f$^{?}\f$\f$_ {<t4,>out3}\f$, StaticTranspose\f$^{?}\f$ + MatMul\f$_ {<t4,>out4}\f$] \f$^{0-4}\f$, StaticTranspose\f$^{?}\f$\f$_ {>t5}\f$, ActivationBackward\f$_ {>t6,<t1}\f$ + MatMul\f$_ {<t5,>out5}\f$, ReduceSum\f$^{?}\f$\f$_ {<t6,>out6}\f$, StaticTranspose\f$^{?}\f$ + MatMul\f$_ {<t6,>out7}\f$ | Multi-layer Perceptron Training Backward Pattern. |
157
- | Convolution\f$_ {>out1}\f$ + BatchNormForwardTraining\f$_ {>out2}\f$ + ReLU\f$_ {>out3}\f$ + Convolution\f$_ {>out4}\f$ + BatchNormForwardTraining\f$_ {>out5}\f$ + ReLU\f$_ {>out6}\f$ + Convolution\f$_ {>out7}\f$ + BatchNormForwardTraining\f$_ {>out8}\f$ + Add + ReLU\f$_ {>out9}\f$ | Identical Bottleneck Training Forward Pattern. |
158
- | Convolution\f$_ {>out1}\f$ + BatchNormForwardTraining\f$_ {>t1,>out2}\f$, Convolution\f$_ {>out3}\f$ + BatchNormForwardTraining\f$_ {>out4}\f$ + ReLU\f$_ {>out5}\f$ + Convolution\f$_ {>out6}\f$ + BatchNormForwardTraining\f$_ {>out7}\f$ + ReLU\f$_ {>out8}\f$ + Convolution\f$_ {>out9}\f$ + BatchNormForwardTraining\f$_ {>out10}\f$ + Add\f$_ {<t1}\f$ + ReLU\f$_ {>out11}\f$ | Convolutional Bottleneck Training Forward Pattern. |
159
- | ReLUBackward\f$_ {>t1}\f$ + BatchNormTrainingBackward\f$_ {>t2,>out1}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_ {>t3,>out2}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_ {>t4,>out3}\f$ + ConvolutionBackwardData + Add\f$_ {<t1,>out4}\f$, ConvolutionBackwardWeights\f$_ {<t2,>out5}\f$, ConvolutionBackwardWeights\f$_ {<t3,>out6}\f$, ConvolutionBackwardWeights\f$_ {<t4,>out7}\f$ | Identical Bottleneck Training Backward Pattern. |
160
- | ReLUBackward\f$_ {>t1}\f$ + BatchNormTrainingBackward\f$_ {>t2,>out1}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_ {>t3,>out2}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_ {>t4,>out3}\f$ + ConvolutionBackwardData + Add\f$_ {<t6,>out4}\f$, BatchNormTrainingBackward\f$_ {<t1,>t5,>out5}\f$ + ConvolutionBackwardData\f$_ {>t6}\f$, ConvolutionBackwardWeights\f$_ {<t2,>out6}\f$, ConvolutionBackwardWeights\f$_ {<t3,>out7}\f$, ConvolutionBackwardWeights\f$_ {<t4,>out8}\f$, ConvolutionBackwardWeights\f$_ {<t5,>out9}\f$ | Convolutional Bottleneck Training Backward Pattern. |
0 commit comments