Skip to content

Commit 9706b78

Browse files
authoredNov 29, 2024
[intel-npu] Publishing NPU_DEFER_WEIGHTS_LOAD property (#27790)
### Details: - Moving NPU_DEFER_WEIGHTS_LOAD property from private to public - adding it to the documentation ### Tickets: - *none*
1 parent 9af054a commit 9706b78

File tree

5 files changed

+16
-14
lines changed

5 files changed

+16
-14
lines changed
 

‎docs/articles_en/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,8 @@ offer a limited set of supported OpenVINO features.
146146
ov::intel_npu::turbo
147147
ov::intel_npu::tiles
148148
ov::intel_npu::max_tiles
149+
ov::intel_npu::bypass_umd_caching
150+
ov::intel_npu::defer_weights_load
149151
150152
.. tab-item:: Read-only properties
151153

@@ -168,7 +170,6 @@ offer a limited set of supported OpenVINO features.
168170
ov::intel_npu::device_alloc_mem_size
169171
ov::intel_npu::device_total_mem_size
170172
ov::intel_npu::driver_version
171-
ov::intel_npu::bypass_umd_caching
172173
173174
174175
.. note::

‎src/inference/include/openvino/runtime/intel_npu/properties.hpp

+7
Original file line numberDiff line numberDiff line change
@@ -95,5 +95,12 @@ static constexpr ov::Property<int64_t> max_tiles{"NPU_MAX_TILES"};
9595
*/
9696
static constexpr ov::Property<bool> bypass_umd_caching{"NPU_BYPASS_UMD_CACHING"};
9797

98+
/**
99+
* @brief [Only for NPU Plugin]
100+
* Type: boolean, default is false
101+
* This option allows to delay loading the weights until inference is created
102+
*/
103+
static constexpr ov::Property<bool> defer_weights_load{"NPU_DEFER_WEIGHTS_LOAD"};
104+
98105
} // namespace intel_npu
99106
} // namespace ov

‎src/plugins/intel_npu/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,7 @@ The following properties are supported:
176176
| `ov::intel_npu::tiles`/</br>`NPU_TILES` | RW | Sets the number of npu tiles to compile the model for | `[0-]` | `-1` |
177177
| `ov::intel_npu::max_tiles`/</br>`NPU_MAX_TILES` | RW | Maximum number of tiles supported by the device we compile for. Can be set for offline compilation. If not set, it will be populated by driver.| `[0-]` | `[1-6] depends on npu platform` |
178178
| `ov::intel_npu::bypass_umd_caching`/</br>`NPU_BYPASS_UMD_CACHING` | RW | Bypass the caching of compiled models in UMD. | `YES`/ `NO`| `NO` |
179+
| `ov::intel_npu::defer_weights_load`/</br>`NPU_DEFER_WEIGHTS_LOAD` | RW | Delay loading the weights until inference is created. | `YES`/ `NO`| `NO` |
179180

180181
&nbsp;
181182
### Performance Hint: Default Number of DPU Groups / DMA Engines

‎src/plugins/intel_npu/src/al/include/intel_npu/npu_private_properties.hpp

-7
Original file line numberDiff line numberDiff line change
@@ -305,13 +305,6 @@ static constexpr ov::Property<BatchMode> batch_mode{"NPU_BATCH_MODE"};
305305
*/
306306
static constexpr ov::Property<int64_t> create_executor{"NPU_CREATE_EXECUTOR"};
307307

308-
/**
309-
* @brief [Only for NPU Plugin]
310-
* Type: boolean, default is false
311-
* This option allows to omit loading the weights until inference is created
312-
*/
313-
static constexpr ov::Property<bool> defer_weights_load{"NPU_DEFER_WEIGHTS_LOAD"};
314-
315308
/**
316309
* @brief Read-only property to get the name of used backend
317310
*/

‎src/plugins/intel_npu/src/plugin/src/plugin.cpp

+6-6
Original file line numberDiff line numberDiff line change
@@ -489,6 +489,12 @@ Plugin::Plugin()
489489
[](const Config& config) {
490490
return config.get<BYPASS_UMD_CACHING>();
491491
}}},
492+
{ov::intel_npu::defer_weights_load.name(),
493+
{true,
494+
ov::PropertyMutability::RW,
495+
[](const Config& config) {
496+
return config.get<DEFER_WEIGHTS_LOAD>();
497+
}}},
492498
// NPU Private
493499
// =========
494500
{ov::intel_npu::dma_engines.name(),
@@ -544,12 +550,6 @@ Plugin::Plugin()
544550
[](const Config& config) {
545551
return config.get<CREATE_EXECUTOR>();
546552
}}},
547-
{ov::intel_npu::defer_weights_load.name(),
548-
{false,
549-
ov::PropertyMutability::RW,
550-
[](const Config& config) {
551-
return config.get<DEFER_WEIGHTS_LOAD>();
552-
}}},
553553
{ov::intel_npu::dynamic_shape_to_static.name(),
554554
{false,
555555
ov::PropertyMutability::RW,

0 commit comments

Comments
 (0)