Skip to content

Latest commit

 

History

History
143 lines (107 loc) · 7.15 KB

File metadata and controls

143 lines (107 loc) · 7.15 KB

Stateful models and State API

.. toctree::
   :maxdepth: 1
   :hidden:

   stateful-models/obtaining-stateful-openvino-model

A "stateful model" is a model that implicitly preserves data between two consecutive inference calls. The tensors saved from one run are kept in an internal memory buffer called a "state" or a "variable" and may be passed to the next run, while never being exposed as model output. In contrast, for a "stateless" model to pass data between runs, all produced data is returned as output and needs to be handled by the application itself for reuse at the next execution.

example comparison between stateless and stateful model implementations

What is more, when a model includes TensorIterator or Loop operations, turning it to stateful makes it possible to retrieve intermediate values from each execution iteration (thanks to the LowLatency transformation). Otherwise, the whole set of their executions needs to finish before the data becomes available.

Text generation is a good usage example of stateful models, as it requires multiple inference calls to output a complete sentence, each run producing a single output token. Information from one run is passed to the next inference as a context, which may be handled by a stateful model natively. Potential benefits for this, as well as other scenarios, may be:

  1. model execution speedup - data in states is stored in the optimized form for OpenVINO plugins, which helps to execute the model more efficiently. Importantly, requesting data from the state too often may reduce the expected performance gains or even lead to losses. Use the state mechanism only if the state data is not accessed very frequently.
  2. user code simplification - states can replace code-based solutions for such scenarios as giving initializing values for the first inference call or copying data from model outputs to inputs. With states, OpenVINO will manage these cases internally, additionally removing the potential for additional overhead due to data representation conversion.
  3. data processing - some use cases require processing of data sequences. When such a sequence is of known length and short enough, you can process it with RNN-like models that contain a cycle inside. When the length is not known, as in the case of online speech recognition or time series forecasting, you can divide the data in small portions and process it step-by-step, which requires addressing the dependency between data portions. States fulfil this purpose well: models save some data between inference runs, when one dependent sequence is over, the state may be reset to the initial value and a new sequence can be started.

OpenVINO Stateful Model Representation

To make a model stateful, OpenVINO replaces looped pairs of Parameter and Result with its own two operations:

Each pair of these operations works with state, which is automatically saved between inference runs and can be reset when needed. This way, the burden of copying data is shifted from the application code to OpenVINO and all related internal work is hidden from the user.

There are three methods of turning an OpenVINO model into a stateful one:

Running Inference of Stateful Models

For the most basic applications, stateful models work out of the box. For additional control, OpenVINO offers a dedicated API, whose methods enable you to both retrieve and change data saved in states between inference runs. OpenVINO runtime uses ov::InferRequest::query_state to get the list of states from a model and the ov::VariableState class to operate with states.

`ov::InferRequest` methods:
std::vector<VariableState> query_state(); - gets all available states for the given inference request
void reset_state() - resets all States to their default values

`ov::VariableState` methods:
std::string get_name() const - returns name(variable_id) of the corresponding State(Variable)
void reset() - resets the state to the default value
void set_state(const Tensor& state) - sets a new value for the state
Tensor get_state() const - returns the current value of the state
Using multiple threads
Note that if multiple independent sequences are involved, several threads may be used to process each section in its own infer request. However, using several infer requests for one sequence is not recommended, as the state would not be passed automatically. Instead, each run performed in a different infer request than the previous one would require the state to be set "manually", using the ov::VariableState::set_state method.

diagram of how initial state value is set or reset

Resetting states
Whenever it is necessary to set the initial value of a state or reset it, an initializing
subgraph for the ReadValue operation and a special reset method are provided.
A case worth mentioning here is, if you decide to reset, query for states, and then retrieve
state data. It will result in undefined values and so, needs to be avoided.

Stateful Model Application Example

Here is a code example demonstrating inference of three independent sequences of data. One infer request and one thread are used. The state should be reset between consecutive sequences.

.. tab:: C++

      .. doxygensnippet:: docs/articles_en/assets/snippets/ov_stateful_models_intro.cpp
         :language: cpp
         :fragment: [ov:state_api_usage]


You can find more examples demonstrating how to work with states in other articles: