Skip to content

Commit ad8e793

Browse files
mitruskammikolajcz
andauthored
[Op][Spec] ISTFT-16 specification (#28807)
### Details: - ISTFT-16 specification ### Tickets: - 159378 --------- Co-authored-by: Mateusz Mikolajczyk <mateusz.mikolajczyk@intel.com>
1 parent ecc3477 commit ad8e793

File tree

1 file changed

+217
-0
lines changed
  • docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/signals

1 file changed

+217
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
.. {#openvino_docs_ops_signals_ISTFT_16}
2+
3+
Inverse Short Time Fourier Transformation (ISTFT)
4+
=================================================
5+
6+
.. meta::
7+
:description: Learn about ISTFT-16 - a signal processing operation
8+
9+
**Versioned name**: *ISTFT-16*
10+
11+
**Category**: *Signal processing*
12+
13+
**Short description**: *ISTFT* operation performs Inverse Short-Time Fourier Transform (complex-to-real).
14+
15+
**Detailed description**: *ISTFT* performs Inverse Short-Time Fourier Transform of complex-valued input tensor
16+
of shape ``[fft_results, frames, 2]`` or ``[batch, fft_results, frames, 2]``, where:
17+
18+
* ``batch`` is a batch size dimension
19+
* ``frames`` is a number of frames calculated as ``((signal_length - frame_size) / frame_step) + 1`` of the original signal if not centered, or ``(signal_length / frame_step) + 1`` otherwise.
20+
* ``fft_results`` is a number calculated as ``(frame_size / 2) + 1`` of the original signal
21+
* ``2`` is the last dimension for complex value represented by floating-point values pair (real and imaginary part accordingly)
22+
23+
The output is a restored real-valued signal in a discrete time domain. The shape of the output is 1D ``[signal_length]`` or 2D ``[batch, signal_length]``.
24+
If the ``signal_length`` is not provided as an input value, it is calculated according to the following rules:
25+
26+
* ``default_signal_length = (frames - 1) * frame_step`` for ``center == true``
27+
* ``default_signal_length = (frames - 1) * frame_step + frame_size`` for ``center == false``
28+
29+
If the ``signal_length`` input is provided, the number of output values will be adjusted accordingly.
30+
* If ``signal_length > default_signal_length`` the output is padded with zeros at the end.
31+
* If ``signal_length < default_signal_length`` any additional generated samples are cut to the ``signal_length`` size.
32+
33+
The ``window_length`` can not be larger than ``frame_size``, but if smaller the window values will be padded with zeros on the left and right side. The size of the left padding is calculated as ``(frame_size - window_length) // 2``, then right padding size is filled to match the ``frame_size``.
34+
35+
**Attributes**:
36+
37+
* *center*
38+
39+
* **Description**: Flag that indicates whether padding has been applied to the original signal. It affects output shape, if the ``signal_length`` input is not provided.
40+
* **Range of values**:
41+
42+
* ``false`` - padding has not been applied, default signal length is calculated as ``(frames - 1) * frame_step + frame_size``
43+
* ``true`` - padding has been applied, default signal length is calculated as ``(frames - 1) * frame_step``
44+
* **Type**: ``boolean``
45+
* **Required**: *yes*
46+
47+
* *normalized*
48+
49+
* **Description**: Flag that indicates whether the input has been normalized. It is needed to correctly restore the signal and denormalize the output. Output of the STFT is divided by ``sqrt(frame_size)``, when normalized.
50+
* **Range of values**:
51+
52+
* ``false`` - input has not been normalized
53+
* ``true`` - input has been normalized
54+
* **Type**: ``boolean``
55+
* **Required**: *yes*
56+
57+
58+
**Inputs**
59+
60+
* **1**: ``data`` - Tensor of type *T*, the ISTFT data input (compatible with a result of STFT operation). **Required.**
61+
62+
* The data input shape can be 3D ``[fft_results, frames, 2]`` or 4D ``[batch, fft_results, frames, 2]``.
63+
* **2**: ``window`` - Tensor of type *T* and 1D shape ``[window_length]``, specifying the window values applied to restore the signal. The ``window_length`` is required to be equal or smaller than ``frame_size``, if smaller the window will be padded with zeros on the left and right sides. **Required.**
64+
* **3**: ``frame_size`` - Scalar tensor of type *T_INT* describing the size of a single frame of the signal to be provided as input to FFT. **Required.**
65+
* **4**: ``frame_step`` - Scalar tensor of type *T_INT* describing the distance (number of samples) between successive frames. **Required.**
66+
* **5**: ``signal_length`` - Scalar or single element 1D tensor of type *T_INT* describing the desired length of the output signal, if not provided it's calculated accordingly to the rules presented in the detailed description above. **Optional.**
67+
68+
69+
**Outputs**
70+
71+
* **1**: ``signal`` - Tensor of type *T* and 1D shape ``[signal_length]`` or 2D shape ``[batch, signal_length]`` with a real valued signal data. **Required.**
72+
73+
**Types**
74+
75+
* *T*: any supported floating-point type.
76+
77+
* *T_INT*: ``int64`` or ``int32``.
78+
79+
80+
**Examples**:
81+
82+
*Example 3D input, 1D output signal, center=false, default signal_length:*
83+
84+
.. code-block:: xml
85+
:force:
86+
87+
<layer ... type="ISTFT" ... >
88+
<data center="false" ... />
89+
<input>
90+
<port id="0">
91+
<dim>6</dim>
92+
<dim>16</dim>
93+
<dim>2</dim>
94+
</port>
95+
<port id="1">
96+
<dim>7</dim>
97+
</port>
98+
<port id="2"></port> <!-- frame_size value: 11 -->
99+
<port id="3"></port> <!-- frame_step value: 3 -->
100+
</input>
101+
<output>
102+
<port id="4">
103+
<dim>56</dim>
104+
</port>
105+
</output>
106+
</layer>
107+
108+
*Example 4D input, 2D output signal, center=false, default signal_length:*
109+
110+
.. code-block:: xml
111+
:force:
112+
113+
<layer ... type="ISTFT" ... >
114+
<data center="false" ... />
115+
<input>
116+
<port id="0">
117+
<dim>4</dim>
118+
<dim>6</dim>
119+
<dim>16</dim>
120+
<dim>2</dim>
121+
</port>
122+
<port id="1">
123+
<dim>7</dim>
124+
</port>
125+
<port id="2"></port> <!-- frame_size value: 11 -->
126+
<port id="3"></port> <!-- frame_step value: 3 -->
127+
</input>
128+
<output>
129+
<port id="4">
130+
<dim>4</dim>
131+
<dim>56</dim>
132+
</port>
133+
</output>
134+
</layer>
135+
136+
137+
*Example 3D input, 1D output signal, center=true, default signal_length:*
138+
139+
.. code-block:: xml
140+
:force:
141+
142+
<layer ... type="ISTFT" ... >
143+
<data center="true" ... />
144+
<input>
145+
<port id="0">
146+
<dim>6</dim>
147+
<dim>16</dim>
148+
<dim>2</dim>
149+
</port>
150+
<port id="1">
151+
<dim>7</dim>
152+
</port>
153+
<port id="2"></port> <!-- frame_size value: 11 -->
154+
<port id="3"></port> <!-- frame_step value: 3 -->
155+
</input>
156+
<output>
157+
<port id="4">
158+
<dim>45</dim>
159+
</port>
160+
</output>
161+
</layer>
162+
163+
*Example 4D input, 2D output signal, center=true, default signal_length:*
164+
165+
.. code-block:: xml
166+
:force:
167+
168+
<layer ... type="ISTFT" ... >
169+
<data center="true" ... />
170+
<input>
171+
<port id="0">
172+
<dim>4</dim>
173+
<dim>6</dim>
174+
<dim>16</dim>
175+
<dim>2</dim>
176+
</port>
177+
<port id="1">
178+
<dim>7</dim>
179+
</port>
180+
<port id="2"></port> <!-- frame_size value: 11 -->
181+
<port id="3"></port> <!-- frame_step value: 3 -->
182+
</input>
183+
<output>
184+
<port id="4">
185+
<dim>4</dim>
186+
<dim>45</dim>
187+
</port>
188+
</output>
189+
</layer>
190+
191+
192+
*Example 3D input, 1D output signal, center=false, signal_length input provided:*
193+
194+
.. code-block:: xml
195+
:force:
196+
197+
<layer ... type="ISTFT" ... >
198+
<data center="false" ... />
199+
<input>
200+
<port id="0">
201+
<dim>6</dim>
202+
<dim>16</dim>
203+
<dim>2</dim>
204+
</port>
205+
<port id="1">
206+
<dim>7</dim>
207+
</port>
208+
<port id="2"></port> <!-- frame_size value: 11 -->
209+
<port id="3"></port> <!-- frame_step value: 3 -->
210+
<port id="4"></port> <!-- signal_length value: 64 -->
211+
</input>
212+
<output>
213+
<port id="5">
214+
<dim>64</dim>
215+
</port>
216+
</output>
217+
</layer>

0 commit comments

Comments
 (0)