Skip to content

Commit 8806f40

Browse files
nosovmikavladimi
authored and
rnugmanx
committed
[DOC] Model caching feature overview (openvinotoolkit#5519)
* Docs: Model caching feature overview * Update docs/IE_DG/Intro_to_Performance.md Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com> * Apply suggestions from code review Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com> * Review comments - Moved code examples to snippets - Added link to Model Caching overview from "Inference Engine Developer Guide" - Few minor changes * Update docs/IE_DG/Intro_to_Performance.md Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com> Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>
1 parent 0ab7dfa commit 8806f40

9 files changed

+142
-0
lines changed

docs/IE_DG/Intro_to_Performance.md

+6
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@ input images to achieve optimal throughput. However, high batch size also comes
3131
latency penalty. So, for more real-time oriented usages, lower batch sizes (as low as a single input) are used.
3232
Refer to the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample, which allows latency vs. throughput measuring.
3333

34+
## Using Caching API for first inference latency optimization
35+
Since with the 2021.4 release, Inference Engine provides an ability to enable internal caching of loaded networks.
36+
This can significantly reduce load network latency for some devices at application startup.
37+
Internally caching uses plugin's Export/ImportNetwork flow, like it is done for [Compile tool](../../inference-engine/tools/compile_tool/README.md), using the regular ReadNetwork/LoadNetwork API.
38+
Refer to the [Model Caching Overview](Model_caching_overview.md) for more detailed explanation.
39+
3440
## Using Async API
3541
To gain better performance on accelerators, such as VPU, the Inference Engine uses the asynchronous approach (see
3642
[Integrating Inference Engine in Your Application (current API)](Integrate_with_customer_application_new_API.md)).

docs/IE_DG/Model_caching_overview.md

+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Model Caching Overview {#openvino_docs_IE_DG_Model_caching_overview}
2+
3+
## Introduction
4+
5+
As described in [Inference Engine Introduction](inference_engine_intro.md), common application flow consists of the following steps:
6+
7+
1. **Create Inference Engine Core object**
8+
9+
2. **Read the Intermediate Representation** - Read an Intermediate Representation file into an object of the `InferenceEngine::CNNNetwork`
10+
11+
3. **Prepare inputs and outputs**
12+
13+
4. **Set configuration** Pass device-specific loading configurations to the device
14+
15+
5. **Compile and Load Network to device** - Use the `InferenceEngine::Core::LoadNetwork()` method with specific device
16+
17+
6. **Set input data**
18+
19+
7. **Execute**
20+
21+
Step #5 can potentially perform several time-consuming device-specific optimizations and network compilations,
22+
and such delays can lead to bad user experience on application startup. To avoid this, some devices offer
23+
Import/Export network capability, and it is possible to either use [Compile tool](../../inference-engine/tools/compile_tool/README.md)
24+
or enable model caching to export compiled network automatically. Reusing cached networks can significantly reduce load network time.
25+
26+
27+
## Set "CACHE_DIR" config option to enable model caching
28+
29+
To enable model caching, the application must specify the folder where to store cached blobs. It can be done like this
30+
31+
32+
@snippet snippets/InferenceEngine_Caching0.cpp part0
33+
34+
With this code, if device supports Import/Export network capability, cached blob is automatically created inside the `myCacheFolder` folder
35+
CACHE_DIR config is set to the Core object. If device does not support Import/Export capability, cache is just not created and no error is thrown
36+
37+
Depending on your device, total time for loading network on application startup can be significantly reduced.
38+
Please also note that very first LoadNetwork (when cache is not yet created) takes slightly longer time to 'export' compiled blob into a cache file
39+
![caching_enabled]
40+
41+
## Even faster: use LoadNetwork(modelPath)
42+
43+
In some cases, applications do not need to customize inputs and outputs every time. Such applications always
44+
call `cnnNet = ie.ReadNetwork(...)`, then `ie.LoadNetwork(cnnNet, ..)` and it can be further optimized.
45+
For such cases, more convenient API to load network in one call is introduced in the 2021.4 release.
46+
47+
@snippet snippets/InferenceEngine_Caching1.cpp part1
48+
49+
With enabled model caching, total load time is even smaller - in case that ReadNetwork is optimized as well
50+
51+
@snippet snippets/InferenceEngine_Caching2.cpp part2
52+
53+
![caching_times]
54+
55+
56+
## Advanced examples
57+
58+
Not every device supports network import/export capability, enabling of caching for such devices do not have any effect.
59+
To check in advance if a particular device supports model caching, your application can use the following code:
60+
61+
@snippet snippets/InferenceEngine_Caching3.cpp part3
62+
63+
64+
[caching_enabled]: ../img/caching_enabled.png
65+
[caching_times]: ../img/caching_times.png

docs/doxygen/ie_docs.xml

+1
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,7 @@ limitations under the License.
285285
<tab type="user" title="Inference Engine API Changes History" url="@ref openvino_docs_IE_DG_API_Changes"/>
286286
<tab type="user" title="Inference Engine Memory primitives" url="@ref openvino_docs_IE_DG_Memory_primitives"/>
287287
<tab type="user" title="Inference Engine Device Query API" url="@ref openvino_docs_IE_DG_InferenceEngine_QueryAPI"/>
288+
<tab type="user" title="Inference Engine Model Caching" url="@ref openvino_docs_IE_DG_Model_caching_overview"/>
288289
<tab type="usergroup" title="Inference Engine Extensibility Mechanism" url="@ref openvino_docs_IE_DG_Extensibility_DG_Intro">
289290
<tab type="user" title="Extension Library" url="@ref openvino_docs_IE_DG_Extensibility_DG_Extension"/>
290291
<tab type="user" title="Custom Operations" url="@ref openvino_docs_IE_DG_Extensibility_DG_AddingNGraphOps"/>

docs/img/caching_enabled.png

+3
Loading

docs/img/caching_times.png

+3
Loading
+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#include <ie_core.hpp>
2+
3+
int main() {
4+
using namespace InferenceEngine;
5+
std::string modelPath = "/tmp/myModel.xml";
6+
std::string device = "GNA";
7+
std::map<std::string, std::string> deviceConfig;
8+
//! [part0]
9+
InferenceEngine::Core ie; // Step 1: create Inference engine object
10+
ie.SetConfig({{CONFIG_KEY(CACHE_DIR), "myCacheFolder"}}); // Step 1b: Enable caching
11+
auto cnnNet = ie.ReadNetwork(modelPath); // Step 2: ReadNetwork
12+
//... // Step 3: Prepare inputs/outputs
13+
//... // Step 4: Set device configuration
14+
ie.LoadNetwork(cnnNet, device, deviceConfig); // Step 5: LoadNetwork
15+
//! [part0]
16+
return 0;
17+
}
+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#include <ie_core.hpp>
2+
3+
int main() {
4+
using namespace InferenceEngine;
5+
std::string modelPath = "/tmp/myModel.xml";
6+
std::string device = "GNA";
7+
std::map<std::string, std::string> deviceConfig;
8+
//! [part1]
9+
InferenceEngine::Core ie; // Step 1: create Inference engine object
10+
ie.LoadNetwork(modelPath, device, deviceConfig); // Step 2: LoadNetwork by model file path
11+
//! [part1]
12+
return 0;
13+
}
+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#include <ie_core.hpp>
2+
3+
int main() {
4+
using namespace InferenceEngine;
5+
std::string modelPath = "/tmp/myModel.xml";
6+
std::string device = "GNA";
7+
std::map<std::string, std::string> deviceConfig;
8+
//! [part2]
9+
InferenceEngine::Core ie; // Step 1: create Inference engine object
10+
ie.SetConfig({{CONFIG_KEY(CACHE_DIR), "myCacheFolder"}}); // Step 1b: Enable caching
11+
ie.LoadNetwork(modelPath, device, deviceConfig); // Step 2: LoadNetwork by model file path
12+
//! [part2]
13+
return 0;
14+
}
+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#include <ie_core.hpp>
2+
3+
int main() {
4+
using namespace InferenceEngine;
5+
std::string modelPath = "/tmp/myModel.xml";
6+
std::string deviceName = "GNA";
7+
std::map<std::string, std::string> deviceConfig;
8+
InferenceEngine::Core ie;
9+
//! [part3]
10+
// Get list of supported metrics
11+
std::vector<std::string> keys = ie.GetMetric(deviceName, METRIC_KEY(SUPPORTED_METRICS));
12+
13+
// Find 'IMPORT_EXPORT_SUPPORT' metric in supported metrics
14+
auto it = std::find(keys.begin(), keys.end(), METRIC_KEY(IMPORT_EXPORT_SUPPORT));
15+
16+
// If metric 'IMPORT_EXPORT_SUPPORT' exists, check it's value
17+
bool cachingSupported = (it != keys.end()) && ie.GetMetric(deviceName, METRIC_KEY(IMPORT_EXPORT_SUPPORT));
18+
//! [part3]
19+
return 0;
20+
}

0 commit comments

Comments
 (0)