Skip to content

Commit aafb3fc

Browse files
authored
NPUW Hotfixes: Memory and L0 pipeline (#27826)
### Details: - Keep tensors for decompression cut-off in a host-side closure, not lazy tensor - so they are not uploaded to bank & detached after that - This leads to 2x memory consumption and the subsequent crash - Relaxed requirements to enable the unfolded execution - so it may still happen if there's single-call functions that require DCOFF (previously having those would reject this unfolded path ### Tickets: - C-155523 (most likely, related to) @smirnov-alexey please take care of the release branch cherry-pick
1 parent fc0b54e commit aafb3fc

File tree

2 files changed

+7
-8
lines changed

2 files changed

+7
-8
lines changed

src/plugins/intel_npu/src/plugin/npuw/compiled_model.cpp

+3-2
Original file line numberDiff line numberDiff line change
@@ -727,8 +727,9 @@ std::shared_ptr<ov::ISyncInferRequest> ov::npuw::CompiledModel::create_sync_infe
727727
const auto num_submodels = m_compiled_submodels.size();
728728
for (std::size_t idx = 0u; idx < num_submodels; idx++) {
729729
const auto& comp_model_desc = m_compiled_submodels[idx];
730-
if (!comp_model_desc.replaced_by.has_value()) {
731-
// not a funcall, do nothing
730+
if (!comp_model_desc.replaced_by.has_value() || comp_model_desc.forced_to_fcall) {
731+
// not a funcall, do nothing, or a subgraph that was forced to funcall
732+
// (a 1-call function) - skip
732733
continue;
733734
}
734735
const auto real_idx = comp_model_desc.replaced_by.value();

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+4-6
Original file line numberDiff line numberDiff line change
@@ -97,12 +97,10 @@ ClosureRemap build_remap(const Function& fbody, const DCOFFParams& params_to) {
9797
LOG_DEBUG("This is an OK parameter, will be kept");
9898
m.closure_remap.push_back(i - fbody._param_offset);
9999

100-
// Check if unpack is indeed required
101-
const auto& type = param->get_element_type();
102-
if (type == ov::element::i4 || type == ov::element::u4 || type == ov::element::i8 ||
103-
type == ov::element::u8) {
104-
m.weights_to_unpack.insert(i - fbody._param_offset);
105-
}
100+
// FIXME: type should be queried from a lazy tensor
101+
// and compared against param->get_element_type()
102+
// to decide 100%
103+
m.weights_to_unpack.insert(i - fbody._param_offset);
106104
}
107105

108106
// Process zero points for parameters

0 commit comments

Comments
 (0)