Skip to content

Commit 1176df0

Browse files
Update genai-guide-npu.rst (#26125)
### Details: - *item1* - *...* ### Tickets: - *ticket-id* --------- Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
1 parent 808b7e9 commit 1176df0

File tree

1 file changed

+34
-38
lines changed

1 file changed

+34
-38
lines changed

docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst

+34-38
Original file line numberDiff line numberDiff line change
@@ -8,60 +8,57 @@ This guide will give you extra details on how to utilize NPU with the GenAI flav
88
:doc:`See the installation guide <../../get-started/install-openvino/install-openvino-genai>`
99
for information on how to start.
1010

11-
Export an LLM model via Hugging Face Optimum-Intel
12-
##################################################
13-
14-
1. Create a python virtual environment and install the correct components for exporting a model:
15-
16-
.. code-block:: console
11+
Prerequisites
12+
#############
1713

18-
python -m venv export-npu-env
19-
export-npu-env\Scripts\activate
20-
pip install transformers>=4.42.4 openvino==2024.2.0 openvino-tokenizers==2024.2.0 nncf==2.11.0 onnx==1.16.1 optimum-intel@git+https://github.com/huggingface/optimum-intel.git
14+
Install required dependencies:
2115

22-
2. A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using the NPU:
16+
.. code-block:: console
2317
24-
.. code-block:: python
18+
python -m venv npu-env
19+
npu-env\Scripts\activate
20+
pip install optimum-intel nncf==2.11 onnx==1.16.1
21+
pip install --pre openvino==2024.3.0.dev20240807 openvino-tokenizers==2024.3.0.0.dev20240807 openvino-genai==2024.3.0.0.dev20240807 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
2522
26-
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --group-size 128 --ratio 1.0 TinyLlama
23+
24+
Export an LLM model via Hugging Face Optimum-Intel
25+
##################################################
2726

28-
Run generation using OpenVINO GenAI
29-
##########################################
27+
A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using the NPU:
3028

31-
1. Create a python virtual environment and install the correct components for running the model on the NPU via OpenVINO GenAI:
29+
.. code-block:: python
3230
33-
.. code-block:: console
31+
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --group-size 128 --ratio 1.0 TinyLlama
3432
35-
python -m venv run-npu-env
36-
run-npu-env\Scripts\activate
37-
pip install --pre openvino==2024.3.0.dev20240807 openvino-tokenizers==2024.3.0.0.dev20240807 openvino-genai==2024.3.0.0.dev20240807 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
33+
Run generation using OpenVINO GenAI
34+
###################################
3835

39-
2. Perform generation using the new GenAI API
36+
Use the following code snippet to perform generation with OpenVINO GenAI API:
4037

41-
.. tab-set::
38+
.. tab-set::
4239

43-
.. tab-item:: Python
44-
:sync: py
40+
.. tab-item:: Python
41+
:sync: py
4542

46-
.. code-block:: python
43+
.. code-block:: python
4744
48-
import openvino_genai as ov_genai
49-
pipe = ov_genai.LLMPipeline(model_path, "NPU")
50-
print(pipe.generate("What is OpenVINO?", max_new_tokens=100))
45+
import openvino_genai as ov_genai
46+
pipe = ov_genai.LLMPipeline(model_path, "NPU")
47+
print(pipe.generate("The Sun is yellow because", max_new_tokens=100))
5148
52-
.. tab-item:: C++
53-
:sync: cpp
49+
.. tab-item:: C++
50+
:sync: cpp
5451

55-
.. code-block:: cpp
52+
.. code-block:: cpp
5653
57-
#include "openvino/genai/llm_pipeline.hpp"
58-
#include <iostream>
54+
#include "openvino/genai/llm_pipeline.hpp"
55+
#include <iostream>
5956
60-
int main(int argc, char* argv[]) {
61-
std::string model_path = argv[1];
62-
ov::genai::LLMPipeline pipe(model_path, "NPU");
63-
std::cout << pipe.generate("What is OpenVINO?", ov::genai::max_new_tokens(100));
64-
}
57+
int main(int argc, char* argv[]) {
58+
std::string model_path = argv[1];
59+
ov::genai::LLMPipeline pipe(model_path, "NPU");
60+
std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(100));
61+
}
6562
6663
Additional configuration options
6764
################################
@@ -97,4 +94,3 @@ Additional Resources
9794
* :doc:`NPU Device <../../openvino-workflow/running-inference/inference-devices-and-modes/npu-device>`
9895
* `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai>`__
9996
* `Neural Network Compression Framework <https://github.com/openvinotoolkit/nncf>`__
100-

0 commit comments

Comments
 (0)