Update genai-guide-npu.rst (#26125)

TolyaTalamanov · kblaszczak-intel · web-flow · commit 1176df095d28 · 2024-08-21T10:30:51.000Z
### Details:
 - *item1*
 - *...*

### Tickets:
 - *ticket-id*

---------

Co-authored-by: Karol Blaszczak &lt;karol.blaszczak@intel.com&gt;
diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst
@@ -8,60 +8,57 @@ This guide will give you extra details on how to utilize NPU with the GenAI flav
 :doc:`See the installation guide <../../get-started/install-openvino/install-openvino-genai>`
 for information on how to start.
 
-Export an LLM model via Hugging Face Optimum-Intel
-##################################################
-
-1. Create a python virtual environment and install the correct components for exporting a model:
-
-   .. code-block:: console
+Prerequisites
+#############
 
-      python -m venv export-npu-env
-      export-npu-env\Scripts\activate
-      pip install transformers>=4.42.4 openvino==2024.2.0 openvino-tokenizers==2024.2.0 nncf==2.11.0 onnx==1.16.1 optimum-intel@git+https://github.com/huggingface/optimum-intel.git
+Install required dependencies:
 
-2. A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using the NPU:
+.. code-block:: console
 
-   .. code-block:: python
+   python -m venv npu-env
+   npu-env\Scripts\activate
+   pip install optimum-intel nncf==2.11 onnx==1.16.1
+   pip install --pre openvino==2024.3.0.dev20240807 openvino-tokenizers==2024.3.0.0.dev20240807 openvino-genai==2024.3.0.0.dev20240807 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
 
-      optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --group-size 128 --ratio 1.0 TinyLlama
+   
+Export an LLM model via Hugging Face Optimum-Intel
+##################################################
 
-Run generation using OpenVINO GenAI
-##########################################
+A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using the NPU:
 
-1. Create a python virtual environment and install the correct components for running the model on the NPU via OpenVINO GenAI:
+.. code-block:: python
 
-   .. code-block:: console
+   optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --group-size 128 --ratio 1.0 TinyLlama
 
-      python -m venv run-npu-env
-      run-npu-env\Scripts\activate
-      pip install --pre openvino==2024.3.0.dev20240807 openvino-tokenizers==2024.3.0.0.dev20240807 openvino-genai==2024.3.0.0.dev20240807 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
+Run generation using OpenVINO GenAI
+###################################
 
-2. Perform generation using the new GenAI API
+Use the following code snippet to perform generation with OpenVINO GenAI API:
 
-   .. tab-set::
+.. tab-set::
 
-      .. tab-item:: Python
-         :sync: py
+   .. tab-item:: Python
+      :sync: py
 
-         .. code-block:: python
+      .. code-block:: python
 
-            import openvino_genai as ov_genai
-            pipe = ov_genai.LLMPipeline(model_path, "NPU")
-            print(pipe.generate("What is OpenVINO?", max_new_tokens=100))
+         import openvino_genai as ov_genai
+         pipe = ov_genai.LLMPipeline(model_path, "NPU")
+         print(pipe.generate("The Sun is yellow because", max_new_tokens=100))
 
-      .. tab-item:: C++
-         :sync: cpp
+   .. tab-item:: C++
+      :sync: cpp
 
-         .. code-block:: cpp
+      .. code-block:: cpp
 
-            #include "openvino/genai/llm_pipeline.hpp"
-            #include <iostream>
+         #include "openvino/genai/llm_pipeline.hpp"
+         #include <iostream>
 
-            int main(int argc, char* argv[]) {
-               std::string model_path = argv[1];
-               ov::genai::LLMPipeline pipe(model_path, "NPU");
-               std::cout << pipe.generate("What is OpenVINO?", ov::genai::max_new_tokens(100));
-            }
+         int main(int argc, char* argv[]) {
+            std::string model_path = argv[1];
+            ov::genai::LLMPipeline pipe(model_path, "NPU");
+            std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(100));
+         }
 
 Additional configuration options
 ################################
@@ -97,4 +94,3 @@ Additional Resources
 * :doc:`NPU Device <../../openvino-workflow/running-inference/inference-devices-and-modes/npu-device>`
 * `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai>`__
 * `Neural Network Compression Framework <https://github.com/openvinotoolkit/nncf>`__
-