Skip to content

Commit 6e1f3fd

Browse files
committed
doc: graph: patterns: fix words and remove legacy graph compiler patterns
1 parent f4b3a24 commit 6e1f3fd

File tree

1 file changed

+5
-58
lines changed

1 file changed

+5
-58
lines changed

doc/graph/fusion_patterns/fusion_patterns.md

+5-58
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ Fusion Patterns {#dev_guide_graph_fusion_patterns}
44
## Overview
55

66
The following fusion patterns are subgraphs that the oneDNN Graph API recognizes
7-
as candidate for fusion. The patterns are described using oneDNN Graph
7+
as candidates for fusion. The patterns are described using oneDNN Graph
88
operation (op) names with the following convention.
99

1010
@note oneDNN Graph performs limited input validation to minimize the performance
1111
overheads. The application is responsible for sanitizing inputs passed to the
12-
library. For large u8 or s8 inputs may lead to accumulator overflow, you can use
13-
floating point patterns instead of quantized patterns.
12+
library. Because large `u8` or `s8` inputs may lead to accumulator overflow, you
13+
can use floating-point patterns instead of quantized patterns.
1414

1515
`"+"` describes a chain of two ops. The preceding op produces an output tensor,
1616
which is consumed by the following op as its first operand.
@@ -35,7 +35,7 @@ the producer and consumer relation within one graph partition. For example,
3535
A\f$_{>t1}\f$+B+C\f$_{<t1}\f$ refers
3636
to the pattern started with A followed by B and C, and C takes an implicit input
3737
tensor from B and an extra tensor t1 output from A. `">"` refers to the output
38-
tensor, and `"<"` for input tensor. Input and output tensor between neighbor
38+
tensor, and `"<"` for input tensor. Input and output tensors between neighbor
3939
ops are not explicitly marked, for example, B consumes t1 implicitly in the
4040
example above.
4141

@@ -55,7 +55,7 @@ inputs. In the example A\f$_{>t1}\f$+B+C\f$_{<t1}\f$, A's inputs are
5555
regarded as implicit graph partition inputs, and if B is a binary operation, the
5656
second input tensor is an implicit graph partition input.
5757

58-
The following categories will be used in describing fusion pattern.
58+
The following categories will be used in describing a fusion pattern.
5959

6060
Unary = [Abs | Clamp | Elu | Exp | GELU | HardSwish | LeakyReLU |
6161
Log | Sigmoid | SoftPlus | Pow | ReLU | Round | Sqrt | Square | Tanh]
@@ -105,56 +105,3 @@ ReduceProd | ReduceSum]
105105
|:--------|:-----------------------------|
106106
| ConvolutionBackwardWeights + BiasAddBackward\f$_{>out}\f$ | N/A |
107107
| ReLUBackward + BatchNormTrainingBackward\f$_{>out}\f$ |N/A |
108-
109-
All the above fusion patterns are supported by default.
110-
111-
## Aggressive Fusion Patterns
112-
Aggressive fusion patterns also follow the pattern description convention
113-
defined in the [Fusion Patterns](@ref fusion_patterns) section.
114-
115-
@note Aggressive fusion patterns are only supported when
116-
[Graph Compiler](@ref dev_guide_graph_compiler) is enabled.
117-
118-
The following categories will also be used to describe aggressive fusion
119-
patterns.
120-
121-
- ReshapeTranspose = [StaticReshape + StaticTranspose\f$^{1-2}\f$]
122-
123-
- Activation = [ReLU \| Sigmoid \| GELU]
124-
125-
- ActivationBackward = [ReLUBackward \| SigmoidBackward \| GELUBackward]
126-
127-
### Inference
128-
129-
#### Floating Point Patterns
130-
131-
| Pattern | Description |
132-
|:--------|:-----------------------------|
133-
| MatMul + [Multiply \| Divide] + Add + Softmax + MatMul + StaticTranspose + Reorder\f$_{>out}\f$ | Multi-head Attention. This pattern is widely used in models containing encoder-decoder structures, for example BERT. |
134-
| ReshapeTranspose\f$_{>t1}\f$, ReshapeTranspose\f$_{>t2}\f$, ReshapeTranspose + MatMul\f$_{<t1}\f$ + [Multiply \| Divide] + Add + Softmax + MatMul\f$_{<t2}\f$ + StaticTranspose + StaticReshape\f$_{>out}\f$ | Multi-head Attention. |
135-
| MatMul + Activation\f$_{>t1}\f$, [MatMul\f$_{<t1}\f$ + Activation\f$_{>t1}\f$]\f$^{0-4}\f$, MatMul\f$_{<t1}\f$ + Activation\f$_{>out}\f$ | Multi-layer Perceptron. This pattern is widely used in recommendation models, for example DLRM. |
136-
| [Convolution + BiasAdd\f$^{?}\f$ + ReLU]\f$^{1-3}\f$ + Convolution + BiasAdd\f$^{?}\f$ + Add + ReLU\f$_{>out}\f$ | Identical Bottleneck. Enabled only in single thread runtime scenario. This pattern is widely used in Convolution Neural Networks, for example ResNet. |
137-
| Convolution + BiasAdd\f$^{?}\f$\f$_{>t1}\f$, [Convolution + BiasAdd\f$^{?}\f$ + ReLU]\f$^{1-3}\f$ + Convolution + BiasAdd\f$^{?}\f$ + Add\f$_{<t1}\f$ + ReLU\f$_{>out}\f$ | Convolutional Bottleneck. Enabled only in single thread runtime scenario. This pattern is widely used in Convolution Neural Networks, for example ResNet. |
138-
139-
#### Quantized Patterns
140-
141-
| Pattern | Description |
142-
|:--------|:-----------------------------|
143-
| Dequantize\f$_{>t1}\f$, Dequantize\f$_{>t2}\f$, Dequantize + MatMul\f$_{<t1}\f$ + [Multiply \| Divide] + Add + Softmax + Quantize + Dequantize + MatMul\f$_{<t2}\f$ + StaticTranspose + Reorder + Quantize\f$_{>out}\f$ | Quantized Multi-head Attention. |
144-
| Dequantize + ReshapeTranspose\f$_{>t1}\f$, Dequantize + ReshapeTranspose\f$_{>t2}\f$, Dequantize + MatMul\f$_{<t1}\f$ + [Multiply \| Divide] + Add + Softmax + Quantize + Dequantize + MatMul\f$_{<t2}\f$ + StaticTranspose + StaticReshape + Quantize\f$_{>out}\f$ | Quantized Multi-head Attention. |
145-
| Dequantize\f$_{>t1}\f$, Dequantize + MatMul\f$_{<t1}\f$ + Activation + Quantize\f$_{>t2}\f$, [Dequantize\f$_{>t3}\f$, Dequantize\f$_{<t2}\f$ + MatMul\f$_{<t3}\f$ + Activation + Quantize\f$_{>t2}\f$]\f$^{0-4}\f$, Dequantize\f$_{>t4}\f$, Dequantize\f$_{<t2}\f$ + MatMul\f$_{<t4}\f$ + Activation + Quantize\f$_{>out}\f$ | Quantized Multi-layer Perceptron. |
146-
| Dequantize\f$_{>t2}\f$, Dequantize\f$_{>t3}\f$, [Dequantize\f$_{>t1}\f$, Dequantize + Convolution\f$_{<t1}\f$ + BiasAdd\f$^{?}\f$ + ReLU + Quantize]\f$^{1-3}\f$ + Dequantize + Convolution\f$_{<t2}\f$ + BiasAdd\f$^{?}\f$ + Add\f$_{<t3}\f$ + ReLU + Quantize\f$_{>out}\f$ | Quantized Identical Bottleneck. Enabled only in single thread runtime scenario. |
147-
| [Dequantize\f$_{>t1}\f$, Dequantize + Convolution\f$_{<t1}\f$ + BiasAdd\f$^{?}\f$ + Quantize + Dequantize]\f$_{>t2}\f$, Dequantize\f$_{>t4}\f$, [Dequantize\f$_{>t3}\f$, Dequantize + Convolution\f$_{<t3}\f$ + BiasAdd\f$^{?}\f$ + ReLU + Quantize]\f$^{1-3}\f$ + Dequantize + Convolution\f$_{<t4}\f$ + BiasAdd\f$^{?}\f$ + Add\f$_{<t2}\f$ + ReLU + Quantize\f$_{>out}\f$ | Quantized Convolutional Bottleneck. Enabled only in single thread runtime scenario. |
148-
149-
### Training
150-
151-
| Pattern | Description |
152-
|:--------|:-----------------------------|
153-
| Dequantize\f$_{>t1}\f$, Dequantize\f$_{>t2}\f$, Dequantize + MatMul\f$_{<t1}\f$ + [Multiply \| Divide] + Add + Softmax + Quantize + Dequantize + MatMul\f$_{<t2}\f$ + StaticTranspose + Reorder + Quantize\f$_{>out}\f$ | Multi-head Attention Training Forward Pattern. |
154-
| StaticReshape + StaticTranspose\f$_{>t1}\f$ + MatMul + Multiply\f$_{>t2}\f$ + Subtract\f$_{<t3}\f$ + Multiply\f$^{?}\f$ + [Multiply \| Divide]\f$_{>t4}\f$ + MatMul\f$_{>out1}\f$, Multiply\f$_{<t2}\f$ + ReduceSum\f$_{>t3}\f$, MatMul\f$_{<t1,>out2}\f$, MatMul\f$_{<t4,>out3}\f$ | Multi-head Attention Training Backward Pattern. |
155-
| MatMul\f$_{>out1}\f$ + Activation\f$_{>t1,>out2}\f$, [MatMul\f$_{<t1,>out3}\f$ + Activation\f$_{>t1,>out4}\f$]\f$^{0-4}\f$, MatMul\f$_{<t1,>out5}\f$ + Activation\f$_{>out6}\f$ | Multi-layer Perceptron Training Forward Pattern. |
156-
| StaticTranspose\f$^{?}\f$\f$_{>t0}\f$, ActivationBackward\f$_{>t2}\f$ + MatMul\f$_{<t0,>t1}\f$, ReduceSum\f$^{?}\f$\f$_{<t2,>out1}\f$, StaticTranspose\f$^{?}\f$ + MatMul\f$_{<t2,>out2}\f$, [StaticTranspose\f$^{?}\f$\f$_{>t3}\f$, ActivationBackward\f$_{>t4,<t1}\f$ + MatMul\f$_{<t3,>t1}\f$, ReduceSum\f$^{?}\f$\f$_{<t4,>out3}\f$, StaticTranspose\f$^{?}\f$ + MatMul\f$_{<t4,>out4}\f$]\f$^{0-4}\f$, StaticTranspose\f$^{?}\f$\f$_{>t5}\f$, ActivationBackward\f$_{>t6,<t1}\f$ + MatMul\f$_{<t5,>out5}\f$, ReduceSum\f$^{?}\f$\f$_{<t6,>out6}\f$, StaticTranspose\f$^{?}\f$ + MatMul\f$_{<t6,>out7}\f$ | Multi-layer Perceptron Training Backward Pattern. |
157-
| Convolution\f$_{>out1}\f$ + BatchNormForwardTraining\f$_{>out2}\f$ + ReLU\f$_{>out3}\f$ + Convolution\f$_{>out4}\f$ + BatchNormForwardTraining\f$_{>out5}\f$ + ReLU\f$_{>out6}\f$ + Convolution\f$_{>out7}\f$ + BatchNormForwardTraining\f$_{>out8}\f$ + Add + ReLU\f$_{>out9}\f$ | Identical Bottleneck Training Forward Pattern. |
158-
| Convolution\f$_{>out1}\f$ + BatchNormForwardTraining\f$_{>t1,>out2}\f$, Convolution\f$_{>out3}\f$ + BatchNormForwardTraining\f$_{>out4}\f$ + ReLU\f$_{>out5}\f$ + Convolution\f$_{>out6}\f$ + BatchNormForwardTraining\f$_{>out7}\f$ + ReLU\f$_{>out8}\f$ + Convolution\f$_{>out9}\f$ + BatchNormForwardTraining\f$_{>out10}\f$ + Add\f$_{<t1}\f$ + ReLU\f$_{>out11}\f$ | Convolutional Bottleneck Training Forward Pattern. |
159-
| ReLUBackward\f$_{>t1}\f$ + BatchNormTrainingBackward\f$_{>t2,>out1}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_{>t3,>out2}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_{>t4,>out3}\f$ + ConvolutionBackwardData + Add\f$_{<t1,>out4}\f$, ConvolutionBackwardWeights\f$_{<t2,>out5}\f$, ConvolutionBackwardWeights\f$_{<t3,>out6}\f$, ConvolutionBackwardWeights\f$_{<t4,>out7}\f$ | Identical Bottleneck Training Backward Pattern. |
160-
| ReLUBackward\f$_{>t1}\f$ + BatchNormTrainingBackward\f$_{>t2,>out1}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_{>t3,>out2}\f$ + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward\f$_{>t4,>out3}\f$ + ConvolutionBackwardData + Add\f$_{<t6,>out4}\f$, BatchNormTrainingBackward\f$_{<t1,>t5,>out5}\f$ + ConvolutionBackwardData\f$_{>t6}\f$, ConvolutionBackwardWeights\f$_{<t2,>out6}\f$, ConvolutionBackwardWeights\f$_{<t3,>out7}\f$, ConvolutionBackwardWeights\f$_{<t4,>out8}\f$, ConvolutionBackwardWeights\f$_{<t5,>out9}\f$ | Convolutional Bottleneck Training Backward Pattern. |

0 commit comments

Comments
 (0)