[DOCS] Dependencies and Building for OpenVINO GenAI article for master (#25908)

msmykx-intel · Kadian · commit 044e5d2aacae · 2024-08-06T12:40:55.000+01:00
Adding information on the OpenVINO GenAI Dependencies and ref-link to the GenAI building in user docs. Added a new pattern in pattern matcher [CPU] Avoid rounding to zero for Reduce node in quantized models (#25766) - *If the Reduce node has both input and output precision to be integers from the original model, then rounding to zero should be done before converting intermediate floating point value to integer.* - *However, if such integer precisions are resulted from quantization, then we should not do such rounding, in order to maintain accuracy.* - *Add corresponding test cases.* - *CVS-147352* Correct clang format issues Tried to resolve the segmentation fault Corrected clang format error Tried to correct segmentation fault Removed std::move Using std::move with much more caution Correct clang format issues Corrected clang format error Tried to correct segmentation fault Removed std::move Using std::move with much more caution Modified comments Change index precision from `i64` to `i32` in MaxPool14 to MaxPool8 downgrade transformation (#25514) - CVS-146277 [DOCS] Corrected build guides in docs. (#25922) - Corrected build guides in docs. - [NPU] Disable MCL in case of UD28 (#25903) - *The UD28 Windows driver version doesn't support as expected the MutableCommandList feature - just disable this feature from the plugin in case this driver is used* - *EISW-133845* [GPU] Bugfix reorder for byfx format (#25782) + Reorder returns OOR error while handling byfx from a fused permute parent - *item1* - *...* - CVS-147330 --------- Signed-off-by: Min, Byung-il <byungil.min@intel.com>
diff --git a/docs/articles_en/get-started/configurations.rst b/docs/articles_en/get-started/configurations.rst
@@ -1,6 +1,4 @@
-.. {#openvino_docs_install_guides_configurations_header}
-
-Additional Configurations For Hardware
+Additional Configurations
 ======================================
 
 
@@ -16,10 +14,10 @@ Additional Configurations For Hardware
 
    For GPU <configurations/configurations-intel-gpu>
    For NPU <configurations/configurations-intel-npu>
+   GenAI Dependencies <configurations/genai-dependencies>
 
-For certain use cases, you may need to install additional software, to use the full
-potential of OpenVINO™. Check the following list for components for elements used in
-your workflow:
+For certain use cases, you may need to install additional software, to benefit from the full
+potential of OpenVINO™. Check the following list for components used in your workflow:
 
 | **GPU drivers**
 |   If you want to run inference on a GPU, make sure your GPU's drivers are properly installed.
@@ -33,6 +31,11 @@ your workflow:
     See the :doc:`guide on NPU configuration <configurations/configurations-intel-npu>`
     for details.
 
+| **OpenVINO GenAI Dependencies**
+|   OpenVINO GenAI is a flavor of OpenVINO, aiming to simplify running generative
+    AI models. For information on the dependencies required to use OpenVINO GenAI, see the
+    :doc:`guide on OpenVINO GenAI Dependencies <configurations/genai-dependencies>`.
+
 | **Open Computer Vision Library**
 |   OpenCV is used to extend the capabilities of some models, for example enhance some of
     OpenVINO samples, when used as a dependency in compilation. To install OpenCV for OpenVINO, see the
diff --git a/docs/articles_en/get-started/configurations/genai-dependencies.rst b/docs/articles_en/get-started/configurations/genai-dependencies.rst
@@ -0,0 +1,31 @@
+OpenVINO™ GenAI Dependencies
+=================================
+
+OpenVINO™ GenAI depends on both `OpenVINO <https://github.com/openvinotoolkit/openvino>`__ and
+`OpenVINO Tokenizers <https://github.com/openvinotoolkit/openvino_tokenizers>`__. During OpenVINO™
+GenAI installation from PyPi, the same versions of OpenVINO and OpenVINO Tokenizers
+are used (e.g. ``openvino==2024.3.0`` and ``openvino-tokenizers==2024.3.0.0`` are installed for
+``openvino-genai==2024.3.0``).
+
+Trying to update any of the dependency packages might result in a version incompatiblibty
+due to different Application Binary Interfaces (ABIs), which will result in errors while running
+OpenVINO GenAI. Having package version in the ``<MAJOR>.<MINOR>.<PATCH>.<REVISION>`` format, allows
+changing the ``<REVISION>`` portion of the full version to ensure ABI compatibility. Changing
+``<MAJOR>``, ``<MINOR>`` or ``<PATCH>`` part of the version may break ABI.
+
+GenAI, Tokenizers, and OpenVINO wheels for Linux on PyPI are compiled with ``_GLIBCXX_USE_CXX11_ABI=0``
+to cover a wider range of platforms. In the C++ archive distributions for Ubuntu, ``_GLIBCXX_USE_CXX11_ABI=1``
+is used instead. Mixing different ABIs is not possible as doing so will result in a link error.
+
+To try OpenVINO GenAI with different dependencies versions (which are **not** prebuilt packages
+as archives or python wheels), build OpenVINO GenAI library from
+`Source <https://github.com/openvinotoolkit/openvino.genai/blob/releases/2024/3/src/docs/BUILD.md#build-openvino-openvino-tokenizers-and-openvino-genai-from-source>`__.
+
+Additional Resources
+#######################
+
+* :doc:`OpenVINO GenAI Installation Guide <../install-openvino/install-openvino-genai>`
+* `OpenVINO GenAI repository <https://github.com/openvinotoolkit/openvino.genai>`__
+* :doc:`OpenVINO Installation Guide <../install-openvino>`
+* :doc:`OpenVINO Tokenizers <../../learn-openvino/llm_inference_guide/ov-tokenizers>`
+
diff --git a/docs/articles_en/get-started/install-openvino/install-openvino-genai.rst b/docs/articles_en/get-started/install-openvino/install-openvino-genai.rst
@@ -11,7 +11,9 @@ To see GenAI in action, check the Jupyter notebooks:
 `LLM-powered Chatbot <https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-chatbot/README.md>`__ and
 `LLM Instruction-following pipeline <https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-question-answering/README.md>`__.
 
-The OpenVINO GenAI flavor is available for installation via PyPI and Archive distributions:
+The OpenVINO GenAI flavor is available for installation via PyPI and Archive distributions.
+A `detailed guide <https://github.com/openvinotoolkit/openvino.genai/blob/releases/2024/3/src/docs/BUILD.md>`__
+on how to build OpenVINO GenAI is available in the OpenVINO GenAI repository.
 
 PyPI Installation
 ###############################
diff --git a/docs/dev/build_linux.md b/docs/dev/build_linux.md
@@ -74,7 +74,7 @@ You can use the following additional build options:
      ```
   3. After the build process finishes, export the newly built Python libraries to the user environment variables:
      ```
-     export PYTHONPATH=<openvino_repo>/bin/intel64/Release/python:$PYTHONPATH
+     export PYTHONPATH=<openvino_repo>/bin/intel64/Release/python:<openvino_repo>/tools/ovc:$PYTHONPATH
      export LD_LIBRARY_PATH=<openvino_repo>/bin/intel64/Release:$LD_LIBRARY_PATH
      ```
      or install the wheel with pip:
diff --git a/docs/dev/build_windows.md b/docs/dev/build_windows.md
@@ -61,7 +61,7 @@ Supported configurations:
      ```
   3. After the build process finishes, export the newly built Python libraries to the user environment variables:
      ```
-     set PYTHONPATH=<openvino_repo>/bin/<arch>/Release/python;%PYTHONPATH%
+     set PYTHONPATH=<openvino_repo>/bin/<arch>/Release/python;<openvino_repo>/tools/ovc;%PYTHONPATH%
      set OPENVINO_LIB_PATHS=<openvino_repo>/bin/<arch>/Release;<openvino_repo>/temp/tbb/bin
      ```
      or install the wheel with pip:
diff --git a/src/common/transformations/src/transformations/op_conversions/convert_maxpool_downgrade.cpp b/src/common/transformations/src/transformations/op_conversions/convert_maxpool_downgrade.cpp
@@ -98,23 +98,23 @@ ov::pass::ConvertMaxPool14ToMaxPool8::ConvertMaxPool14ToMaxPool8() {
             const auto strides = max_pool_v14->get_strides();
             const auto padding_begin = max_pool_v14->get_pads_begin();
             const auto padding_begin_node =
-                node_registry.make<Constant>(element::i64, Shape{padding_begin.size()}, padding_begin);
+                node_registry.make<Constant>(element::i32, Shape{padding_begin.size()}, padding_begin);
             const auto padding_end = max_pool_v14->get_pads_end();
             const auto padding_end_node =
-                node_registry.make<Constant>(element::i64, Shape{padding_end.size()}, padding_end);
-            const auto zero = node_registry.make<Constant>(element::i64, Shape{}, 0);
-            const auto one = node_registry.make<Constant>(element::i64, Shape{}, 1);
-            const auto two = node_registry.make<Constant>(element::i64, Shape{}, 2);
+                node_registry.make<Constant>(element::i32, Shape{padding_end.size()}, padding_end);
+            const auto zero = node_registry.make<Constant>(element::i32, Shape{}, 0);
+            const auto one = node_registry.make<Constant>(element::i32, Shape{}, 1);
+            const auto two = node_registry.make<Constant>(element::i32, Shape{}, 2);
 
             const auto pads_size = max_pool_v14->get_pads_begin().size();
-            const auto pads_len = node_registry.make<Constant>(element::i64, Shape{}, pads_size);
+            const auto pads_len = node_registry.make<Constant>(element::i32, Shape{}, pads_size);
             const auto pads_remaining =
-                node_registry.make<Constant>(element::i64, Shape{2}, std::vector<int64_t>{0, 0});
+                node_registry.make<Constant>(element::i32, Shape{2}, std::vector<int64_t>{0, 0});
 
             // gather input spatial dims and prepare for compare as values (in_dim + pad)
-            const auto end = node_registry.make<Constant>(element::i64, Shape{}, pads_size + 2);
-            const auto dim_idxs = node_registry.make<Range>(two, end, one, element::i64);
-            const auto shape = node_registry.make<ShapeOf>(input, element::i64);
+            const auto end = node_registry.make<Constant>(element::i32, Shape{}, pads_size + 2);
+            const auto dim_idxs = node_registry.make<Range>(two, end, one, element::i32);
+            const auto shape = node_registry.make<ShapeOf>(input, element::i32);
             const auto gth_in_dims = node_registry.make<Gather>(shape, dim_idxs, zero);
             const auto in_left_padded = node_registry.make<Add>(gth_in_dims, padding_begin_node);
 
@@ -126,10 +126,10 @@ ov::pass::ConvertMaxPool14ToMaxPool8::ConvertMaxPool14ToMaxPool8() {
                                                                     max_pool_v14->get_pads_end(),
                                                                     max_pool_v14->get_kernel(),
                                                                     ov::op::RoundingType::CEIL);
-            const auto shape_of_mp = node_registry.make<ShapeOf>(mp, element::i64);
+            const auto shape_of_mp = node_registry.make<ShapeOf>(mp, element::i32);
             const auto gth_out_dims = node_registry.make<Gather>(shape_of_mp, dim_idxs, zero);
             const auto out_sub_one = node_registry.make<Subtract>(gth_out_dims, one);
-            const auto stride_node = node_registry.make<Constant>(element::i64, Shape{strides.size()}, strides);
+            const auto stride_node = node_registry.make<Constant>(element::i32, Shape{strides.size()}, strides);
             const auto out_mul_stride = node_registry.make<Multiply>(out_sub_one, stride_node);
 
             // if (in_dim + pad) > ((out_dim - 1) * stride) sliding window in bound use end padding.
diff --git a/src/common/transformations/tests/op_conversions/convert_maxpool_downgrade_test.cpp b/src/common/transformations/tests/op_conversions/convert_maxpool_downgrade_test.cpp
@@ -91,20 +91,20 @@ std::shared_ptr<ov::Model> create_ceil_torch_workaround_model(const ov::op::Roun
     const ov::Strides strides{2, 2}, dilations{1, 1};
     ov::Shape pads_begin{1, 1}, pads_end{1, 1}, kernel{2, 2};
 
-    const auto padding_begin_node = Constant::create(ov::element::i64, ov::Shape{pads_begin.size()}, pads_begin);
-    const auto padding_end_node = Constant::create(ov::element::i64, ov::Shape{pads_end.size()}, pads_end);
-    const auto zero = Constant::create(ov::element::i64, ov::Shape{}, {0});
-    const auto one = Constant::create(ov::element::i64, ov::Shape{}, {1});
-    const auto two = Constant::create(ov::element::i64, ov::Shape{}, {2});
+    const auto padding_begin_node = Constant::create(ov::element::i32, ov::Shape{pads_begin.size()}, pads_begin);
+    const auto padding_end_node = Constant::create(ov::element::i32, ov::Shape{pads_end.size()}, pads_end);
+    const auto zero = Constant::create(ov::element::i32, ov::Shape{}, {0});
+    const auto one = Constant::create(ov::element::i32, ov::Shape{}, {1});
+    const auto two = Constant::create(ov::element::i32, ov::Shape{}, {2});
 
     const auto pads_size = pads_begin.size();
-    const auto pads_len = Constant::create(ov::element::i64, ov::Shape{}, {pads_size});
-    const auto pads_remaining = Constant::create(ov::element::i64, ov::Shape{2}, {0, 0});
+    const auto pads_len = Constant::create(ov::element::i32, ov::Shape{}, {pads_size});
+    const auto pads_remaining = Constant::create(ov::element::i32, ov::Shape{2}, {0, 0});
 
     // gather input spatial dims and prepare for compare as values (in_dim + pad)
-    const auto end = Constant::create(ov::element::i64, ov::Shape{}, {pads_size + 2});
-    const auto dim_idxs = std::make_shared<Range>(two, end, one, ov::element::i64);
-    const auto shape = std::make_shared<ShapeOf>(input, ov::element::i64);
+    const auto end = Constant::create(ov::element::i32, ov::Shape{}, {pads_size + 2});
+    const auto dim_idxs = std::make_shared<Range>(two, end, one, ov::element::i32);
+    const auto shape = std::make_shared<ShapeOf>(input, ov::element::i32);
     const auto gth_in_dims = std::make_shared<Gather>(shape, dim_idxs, zero);
     const auto in_left_padded = std::make_shared<Add>(gth_in_dims, padding_begin_node);
 
@@ -116,10 +116,10 @@ std::shared_ptr<ov::Model> create_ceil_torch_workaround_model(const ov::op::Roun
                                                           pads_end,
                                                           kernel,
                                                           ov::op::RoundingType::CEIL);
-    const auto shape_of_mp = std::make_shared<ShapeOf>(mp, ov::element::i64);
+    const auto shape_of_mp = std::make_shared<ShapeOf>(mp, ov::element::i32);
     const auto gth_out_dims = std::make_shared<Gather>(shape_of_mp, dim_idxs, zero);
     const auto out_sub_one = std::make_shared<Subtract>(gth_out_dims, one);
-    const auto stride_node = Constant::create(ov::element::i64, ov::Shape{strides.size()}, strides);
+    const auto stride_node = Constant::create(ov::element::i32, ov::Shape{strides.size()}, strides);
     const auto out_mul_stride = std::make_shared<Multiply>(out_sub_one, stride_node);
 
     // if (in_dim + pad) > ((out_dim - 1) * stride) sliding window in bound use end padding.
diff --git a/src/inference/include/openvino/runtime/intel_npu/level_zero/level_zero.hpp b/src/inference/include/openvino/runtime/intel_npu/level_zero/level_zero.hpp
@@ -100,7 +100,7 @@ class ZeroContext : public RemoteContext {
     }
 
     /**
-     * @brief This function is used to obtain remote tensor object from user-supplied Direct3D 12 Core object
+     * @brief This function is used to obtain remote tensor object from user-supplied NT handle object
      * @param type Tensor element type
      * @param shape Tensor shape
      * @param buffer A void* object that should be wrapped by a remote tensor
diff --git a/src/plugins/intel_gpu/src/kernel_selector/cl_kernels/reorder_data.cl b/src/plugins/intel_gpu/src/kernel_selector/cl_kernels/reorder_data.cl
@@ -27,8 +27,15 @@ KERNEL (reorder_data)(
 #endif
     )
 {
+#if INPUT0_LAYOUT_BYFX
+    // GWS_FEATURE takes Y for byfx format
+    const uint b = get_global_id(GWS_BATCH);
+    const uint y = get_global_id(GWS_FEATURE);
+#else
     const uint b = get_global_id(GWS_BATCH);
     const uint f = get_global_id(GWS_FEATURE);
+#endif
+
 #if   INPUT0_DIMS == 2
     const uint y = 0;
     const uint x = 0;
@@ -37,8 +44,14 @@ KERNEL (reorder_data)(
     const uint u = 0;
     const uint v = 0;
 #elif INPUT0_DIMS == 4
-    const uint y = ((uint)(get_global_id(GWS_YX))) / INPUT0_SIZE_X;
-    const uint x = ((uint)(get_global_id(GWS_YX))) % INPUT0_SIZE_X;
+    #if INPUT0_LAYOUT_BYFX
+        // GWS_YX takes (F and X) axes for byfx format
+        const uint f = ((uint)(get_global_id(GWS_YX))) / INPUT0_SIZE_X;
+        const uint x = ((uint)(get_global_id(GWS_YX))) % INPUT0_SIZE_X;
+    #else
+        const uint y = ((uint)(get_global_id(GWS_YX))) / INPUT0_SIZE_X;
+        const uint x = ((uint)(get_global_id(GWS_YX))) % INPUT0_SIZE_X;
+    #endif
     const uint z = 0;
     const uint w = 0;
     const uint u = 0;
diff --git a/src/plugins/intel_gpu/src/kernel_selector/kernels/reorder/reorder_kernel_base.cpp b/src/plugins/intel_gpu/src/kernel_selector/kernels/reorder/reorder_kernel_base.cpp
@@ -189,6 +189,12 @@ ReorderKernelBase::DispatchData ReorderKernelBase::SetDefault(const reorder_para
         dispatchData.lws[0] = 1;
         dispatchData.lws[1] = 16;
         dispatchData.lws[2] = 1;
+    } else if (input_l == DataLayout::byfx) {
+        auto first_primary_axis_size = dispatchData.gws[0];  // X axis
+        auto second_primiary_axis_size =  dispatchData.gws[1];  // YF axes
+        dispatchData.gws[0] = first_primary_axis_size * input.Feature().v;  // takes XF axes
+        dispatchData.gws[1] = second_primiary_axis_size / input.Feature().v;  // takes Y axis
+        dispatchData.lws = {1, 1, 1};
     }
 
     return dispatchData;
diff --git a/src/plugins/intel_gpu/tests/unit/fusions/gemm_fusion_test.cpp b/src/plugins/intel_gpu/tests/unit/fusions/gemm_fusion_test.cpp
@@ -142,6 +142,8 @@ class GemmFusingTest : public ::BaseFusingTest<gemm_test_params> {
 #define CASE_GEMM_PERMUTES_FUSION_FP16_3 { { 17, 11, 2, 18 }, { 17, 11, 18, 4 } }, { 17, 11, 2, 4 }, data_types::f16, data_types::f16, data_types::f16, format::bfyx, data_types::f16, format::bfyx
 #define CASE_GEMM_PERMUTES_FUSION_FP16_4 { { 3, 2, 10, 12 }, { 3, 2, 12, 20 } }, { 3, 2, 10, 20 }, data_types::f16, data_types::f16, data_types::f16, format::bfyx, data_types::f16, format::bfyx
 #define CASE_GEMM_PERMUTES_FUSION_FP16_5 { { 3, 2, 16, 32 }, { 3, 2, 32, 16} }, { 3, 2, 16, 16 }, data_types::f16, data_types::f16, data_types::f16, format::bfyx, data_types::f16, format::bfyx
+#define CASE_GEMM_PERMUTES_FUSION_FP16_6 { { 3, 2, 16, 32 },  { 3, 16, 2, 32} }, { 3, 2, 2, 32 }, data_types::f16, data_types::f16, data_types::f16, format::bfyx, data_types::f16, format::bfyx
+
 class gemm_3in_quantize_i8 : public GemmFusingTest {};
 TEST_P(gemm_3in_quantize_i8, basic) {
     // TODO: Fix me, refer PR(#15873)
@@ -757,4 +759,40 @@ INSTANTIATE_TEST_SUITE_P(
         gemm_test_params{CASE_GEMM_PERMUTES_FUSION_FP16_3, 3, 6, "", broadcast_kinds::feature/*dummy*/, eltwise_mode::sum/*dummy*/, {{0, 2, 1, 3} /*byfx*/, {1, 2, 3, 0} /*xbfy*/, {0, 2, 1, 3} /*byfx*/}},
     }));
 
+class permute_gemm_reorder : public GemmFusingTestOneDNN {};
+TEST_P(permute_gemm_reorder, fused_permute_gemm_with_reorder) {
+    auto p = GetParam();
+    auto in_lay0 = get_input_layout(p, 0);
+    auto in_lay1 = get_input_layout(p, 1);
+    auto permute_in_lay0 = get_permute_input_shape(in_lay0.get_shape(), p.permute_orders[0]);
+    auto permute_in_lay1 = get_permute_input_shape(in_lay1.get_shape(), p.permute_orders[1]);
+    in_lay0.set_partial_shape(permute_in_lay0);
+    in_lay1.set_partial_shape(permute_in_lay1);
+    create_topologies(
+        input_layout("input0", in_lay0),
+        input_layout("input1", in_lay1),
+        permute("permute0", input_info("input0"), p.permute_orders[0]),
+        reorder("reorder_permute", input_info("permute0"), p.default_format, data_types::f32),
+        permute("permute1", input_info("input1"), p.permute_orders[1]),
+        gemm("gemm_prim", { input_info("permute0"), input_info("permute1") }, data_types::f16),
+        reorder("reorder_bfyx", input_info("gemm_prim"), p.default_format, data_types::f32),
+        eltwise("eltwise", { input_info("reorder_permute"), input_info("reorder_bfyx") }, eltwise_mode::sum, data_types::f32)
+    );
+
+    tolerance = default_tolerance(data_types::f16);
+    execute(p, false);
+}
+
+#define CASE_PERMUTES_GEMM_FUSION_FP16_1 { { 1, 12, 20, 64 }, { 1, 12, 64, 64 } }, { 1, 12, 20, 64 }, data_types::f16, data_types::f16, data_types::f16, format::bfyx, data_types::f16, format::bfyx
+#define CASE_PERMUTES_GEMM_FUSION_FP16_2 {  { 3, 2, 10, 12 },   { 3, 2, 12, 1 } },   { 3, 2, 10, 1 }, data_types::f16, data_types::f16, data_types::f16, format::bfyx, data_types::f16, format::bfyx
+
+INSTANTIATE_TEST_SUITE_P(
+    fusings_gpu, permute_gemm_reorder, ::testing::ValuesIn(std::vector<gemm_test_params>{
+        gemm_test_params{CASE_PERMUTES_GEMM_FUSION_FP16_1, 4, 6, "", broadcast_kinds::feature/*dummy*/, eltwise_mode::sum/*dummy*/, {{0, 2, 1, 3} /*byfx*/, {0, 2, 1, 3} /*byfx*/}},
+        gemm_test_params{CASE_PERMUTES_GEMM_FUSION_FP16_1, 4, 6, "", broadcast_kinds::feature/*dummy*/, eltwise_mode::sum/*dummy*/, {{0, 2, 1, 3} /*byfx*/, {1, 2, 3, 0} /*xbfy*/}},
+        gemm_test_params{CASE_PERMUTES_GEMM_FUSION_FP16_2, 4, 6, "", broadcast_kinds::feature/*dummy*/, eltwise_mode::sum/*dummy*/, {{0, 2, 1, 3} /*byfx*/, {0, 2, 1, 3} /*byfx*/}},
+        gemm_test_params{CASE_PERMUTES_GEMM_FUSION_FP16_2, 4, 6, "", broadcast_kinds::feature/*dummy*/, eltwise_mode::sum/*dummy*/, {{0, 2, 1, 3} /*byfx*/, {1, 2, 3, 0} /*xbfy*/}},
+    }));
+
+
 #endif // ENABLE_ONEDNN_FOR_GPU
diff --git a/src/plugins/intel_npu/src/backend/src/zero_init.cpp b/src/plugins/intel_npu/src/backend/src/zero_init.cpp
@@ -10,6 +10,12 @@
 #include "ze_command_queue_npu_ext.h"
 #include "zero_utils.hpp"
 
+#ifdef _WIN32
+namespace {
+constexpr uint32_t WIN_DRIVER_NO_MCL_SUPPORT = 2688;
+}  // namespace
+#endif
+
 namespace intel_npu {
 
 const ze_driver_uuid_t ZeroInitStructsHolder::uuid = ze_intel_npu_driver_uuid;
@@ -169,9 +175,15 @@ ZeroInitStructsHolder::ZeroInitStructsHolder() : log("NPUZeroInitStructsHolder",
         std::make_unique<ze_graph_dditable_ext_decorator>(graph_ddi_table_ext, driver_ext_version);
 
     // Query the mutable command list version
-    std::string mutable_comamnd_list_name;
-    log.debug("ZeroInitStructsHolder - tie output of queryMutableCommandListVersion");
-    mutable_command_list_version = queryMutableCommandListVersion(extProps, count);
+#ifdef _WIN32
+    // The 2688 Windows driver version doesn't support as expected the MutableCommandList feature
+    if (driver_properties.driverVersion != WIN_DRIVER_NO_MCL_SUPPORT) {
+#endif
+        log.debug("ZeroInitStructsHolder - tie output of queryMutableCommandListVersion");
+        mutable_command_list_version = queryMutableCommandListVersion(extProps, count);
+#ifdef _WIN32
+    }
+#endif
 
     log.debug("Mutable command list version %d.%d",
               ZE_MAJOR_VERSION(mutable_command_list_version),
diff --git a/src/plugins/intel_npu/src/backend/src/zero_pipeline.cpp b/src/plugins/intel_npu/src/backend/src/zero_pipeline.cpp
@@ -275,7 +275,7 @@ struct IntegratedPipeline final : public Pipeline {
     };
 
     void updateCommandList(const TensorData& tensors_data, uint32_t index, size_t batch_size) override {
-        OV_ITT_TASK_CHAIN(ZERO_EXECUTOR_IP_PULL,
+        OV_ITT_TASK_CHAIN(ZERO_EXECUTOR_IP_UMCL,
                           itt::domains::LevelZeroBackend,
                           "IntegratedPipeline",
                           "updateCommandList");
diff --git a/src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp b/src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp
diff --git a/src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp b/src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp
diff --git a/src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.hpp b/src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.hpp

Original file line number	Diff line number	Diff line change
`@@ -100,7 +100,7 @@ class ZeroContext : public RemoteContext {`
`100`	`100`	`}`
`101`	`101`
`102`	`102`	`/**`
`103`		`- * @brief This function is used to obtain remote tensor object from user-supplied Direct3D 12 Core object`
	`103`	`+ * @brief This function is used to obtain remote tensor object from user-supplied NT handle object`
`104`	`104`	`* @param type Tensor element type`
`105`	`105`	`* @param shape Tensor shape`
`106`	`106`	`* @param buffer A void* object that should be wrapped by a remote tensor`