@@ -8,60 +8,57 @@ This guide will give you extra details on how to utilize NPU with the GenAI flav
8
8
:doc: `See the installation guide <../../get-started/install-openvino/install-openvino-genai >`
9
9
for information on how to start.
10
10
11
- Export an LLM model via Hugging Face Optimum-Intel
12
- ##################################################
13
-
14
- 1. Create a python virtual environment and install the correct components for exporting a model:
15
-
16
- .. code-block :: console
11
+ Prerequisites
12
+ #############
17
13
18
- python -m venv export-npu-env
19
- export-npu-env\Scripts\activate
20
- pip install transformers>=4.42.4 openvino==2024.2.0 openvino-tokenizers==2024.2.0 nncf==2.11.0 onnx==1.16.1 optimum-intel@git+https://github.com/huggingface/optimum-intel.git
14
+ Install required dependencies:
21
15
22
- 2. A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using the NPU:
16
+ .. code-block :: console
23
17
24
- .. code-block :: python
18
+ python -m venv npu-env
19
+ npu-env\Scripts\activate
20
+ pip install optimum-intel nncf==2.11 onnx==1.16.1
21
+ pip install --pre openvino==2024.3.0.dev20240807 openvino-tokenizers==2024.3.0.0.dev20240807 openvino-genai==2024.3.0.0.dev20240807 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
25
22
26
- optimum- cli export openvino - m TinyLlama/ TinyLlama- 1. 1B - Chat- v1.0 -- weight- format int4 -- sym -- group- size 128 -- ratio 1.0 TinyLlama
23
+
24
+ Export an LLM model via Hugging Face Optimum-Intel
25
+ ##################################################
27
26
28
- Run generation using OpenVINO GenAI
29
- ##########################################
27
+ A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using the NPU:
30
28
31
- 1. Create a python virtual environment and install the correct components for running the model on the NPU via OpenVINO GenAI:
29
+ .. code-block :: python
32
30
33
- .. code-block :: console
31
+ optimum - cli export openvino - m TinyLlama / TinyLlama - 1. 1B - Chat - v1.0 -- weight - format int4 -- sym -- group - size 128 -- ratio 1.0 TinyLlama
34
32
35
- python -m venv run-npu-env
36
- run-npu-env\Scripts\activate
37
- pip install --pre openvino==2024.3.0.dev20240807 openvino-tokenizers==2024.3.0.0.dev20240807 openvino-genai==2024.3.0.0.dev20240807 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
33
+ Run generation using OpenVINO GenAI
34
+ ###################################
38
35
39
- 2. Perform generation using the new GenAI API
36
+ Use the following code snippet to perform generation with OpenVINO GenAI API:
40
37
41
- .. tab-set ::
38
+ .. tab-set ::
42
39
43
- .. tab-item :: Python
44
- :sync: py
40
+ .. tab-item :: Python
41
+ :sync: py
45
42
46
- .. code-block :: python
43
+ .. code-block :: python
47
44
48
- import openvino_genai as ov_genai
49
- pipe = ov_genai.LLMPipeline(model_path, " NPU" )
50
- print (pipe.generate(" What is OpenVINO? " , max_new_tokens = 100 ))
45
+ import openvino_genai as ov_genai
46
+ pipe = ov_genai.LLMPipeline(model_path, " NPU" )
47
+ print (pipe.generate(" The Sun is yellow because " , max_new_tokens = 100 ))
51
48
52
- .. tab-item :: C++
53
- :sync: cpp
49
+ .. tab-item :: C++
50
+ :sync: cpp
54
51
55
- .. code-block :: cpp
52
+ .. code-block :: cpp
56
53
57
- #include "openvino/genai/llm_pipeline.hpp"
58
- #include <iostream>
54
+ #include "openvino/genai/llm_pipeline.hpp"
55
+ #include <iostream>
59
56
60
- int main(int argc, char* argv[]) {
61
- std::string model_path = argv[1];
62
- ov::genai::LLMPipeline pipe(model_path, "NPU");
63
- std::cout << pipe.generate("What is OpenVINO? ", ov::genai::max_new_tokens(100));
64
- }
57
+ int main(int argc, char* argv[]) {
58
+ std::string model_path = argv[1];
59
+ ov::genai::LLMPipeline pipe(model_path, "NPU");
60
+ std::cout << pipe.generate("The Sun is yellow because ", ov::genai::max_new_tokens(100));
61
+ }
65
62
66
63
Additional configuration options
67
64
################################
@@ -97,4 +94,3 @@ Additional Resources
97
94
* :doc: `NPU Device <../../openvino-workflow/running-inference/inference-devices-and-modes/npu-device >`
98
95
* `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai >`__
99
96
* `Neural Network Compression Framework <https://github.com/openvinotoolkit/nncf >`__
100
-
0 commit comments