uxlfoundation
diff --git a/‎doc/graph/fusion_patterns/images/sdpa-mask-3.png
179 KB b/‎doc/graph/fusion_patterns/images/sdpa-mask-3.png
179 KB
diff --git a/‎doc/graph/fusion_patterns/images/sdpa.png
150 KB b/‎doc/graph/fusion_patterns/images/sdpa.png
150 KB
diff --git a/‎doc/graph/fusion_patterns/sdpa.md
+27-12 b/‎doc/graph/fusion_patterns/sdpa.md
+27-12
@@ -44,22 +44,36 @@ optional.
    MatMul with a scaling factor. It can be constructed by [Multiply](@ref dev_guide_op_multiply)
    or [Divide](@ref dev_guide_op_divide) operation in Graph API. The scaling
    factor is given by users as an input of SDPA. \f$\sqrt{d_k}\f$ in the formula
-   is not considered as part of the SDPA pattern as it is constant.
+   is not considered as a part of the SDPA pattern because it is a constant.
 3. The Mask node is optional and is used to apply an attention mask to the
-   output of the previous Scale node. It can be constructed by [Add](@ref dev_guide_op_add)
+   output of the previous Scale node. There are two types of masks that can
+   be applied:
+
+   1. Explicit user-generated mask: You can explicitly create a mask tensor
+   and pass it to the library for the computation of SDPA. In this case, mask
+   can be constructed by [Add](@ref dev_guide_op_add)
    or [Select](@ref dev_guide_op_select) operation in Graph API for different
-   mask policies (eg. causal mask or padding mask). When Add operation is used
-   to apply the mask, the input mask is usually an upper triangular matrix with
-   all the elements above the diagonal filled with `-inf` and zeroes elsewhere.
-   The `-inf` entries will become zero probability after Softmax is applied in
-   the next step. Alternately, a Select operation may be used. In this case, the
-   input is a boolean tensor (for example, with `true` on and below the
-   diagonal, and `false` above the diagonal). A `false` element in the mask
-   forces the corresponding element of the scaled output to `-inf`, while a
-   `true` element leaves it unchanged.
+   mask policies (for example, causal mask or padding mask). When the
+   Add operation is used to apply the mask, the input mask is usually an upper
+   triangular matrix with all the elements above the diagonal filled with
+   `-inf` and zeroes elsewhere. The `-inf` entries will become zero probability
+   after Softmax is applied in the next step.
+   Alternatively, a Select operation may be used. In this case, the
+   input is a boolean tensor (for example, with the boolean value set to `true`
+   on and below the diagonal, and `false` above the diagonal).
+   A `false` element in the mask forces the corresponding element of the scaled
+   output to `-inf`, while a `true` element leaves it unchanged.
 
    ![SDPA-mask-1](images/sdpa-mask-1.png) ![SDPA-mask-2](images/sdpa-mask-2.png)
 
+   2. Implicit library-generated mask: You can use the operations in the library
+   to generate a mask by constructing a subgraph. Currently, Graph API supports
+   generating an implicit causal mask (top-left aligned) using operations of
+   [GenIndex](@ref dev_guide_op_genindex), [GreaterEqual](@ref dev_guide_op_greaterequal)
+   and [Select](@ref dev_guide_op_select).
+
+   ![SDPA-mask-3](images/sdpa-mask-3.png)
+
 4. The SoftMax operation takes the masked output and transforms it into
    probabilities between 0 and 1. See [SoftMax](@ref dev_guide_op_softmax)
    operation in Graph API.
@@ -97,7 +111,8 @@ platforms follow the general description in @ref dev_guide_data_types.
    softmax primitives. The reference implementation requires memory to store the
    intermediate results of the dot products between Query and Key which takes
    \f$O(S^2)\f$ memory. It may lead to out-of-memory error when computing long
-   sequence length input on platforms with limited memory.
+   sequence length input on platforms with limited memory. For an implicit
+   causal mask, the reference implementation is only available on CPU.
 2. The SDPA patterns functionally supports all input shapes meeting the shape
    requirements of each operation in the graph. For example, Add, Multiply,
    Divide, and Select operations require the input tensors to have the same