Skip to content

Commit cf53311

Browse files
[DOCS] tiny article name change (openvinotoolkit#25911)
1 parent fe110fc commit cf53311

File tree

9 files changed

+132
-70
lines changed

9 files changed

+132
-70
lines changed

docs/articles_en/get-started/install-openvino/install-openvino-docker-linux.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
Install Intel® Distribution of OpenVINO™ toolkit from a Docker Image
1+
Install Intel® Distribution of OpenVINO™ Toolkit From a Docker Image
22
=======================================================================
33

44
.. meta::
55
:description: Learn how to use a prebuilt Docker image or create an image
66
manually to install OpenVINO™ Runtime on Linux and Windows operating systems.
77

8-
This guide presents information on how to use a pre-built Docker image/create an image manually to install OpenVINO™ Runtime.
9-
10-
Supported host operating systems for the Docker Base image:
8+
This guide presents information on how to use a pre-built Docker image or create a new image
9+
manually, to install OpenVINO™ Runtime. The supported host operating systems for the Docker
10+
base image are:
1111

1212
- Linux
1313
- Windows (WSL2)

docs/articles_en/learn-openvino/llm_inference_guide.rst

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
.. {#native_vs_hugging_face_api}
2-
31
Large Language Model Inference Guide
42
========================================
53

@@ -14,8 +12,8 @@ Large Language Model Inference Guide
1412
:hidden:
1513

1614
Run LLMs with Optimum Intel <llm_inference_guide/llm-inference-hf>
17-
Run LLMs with OpenVINO GenAI Flavor <llm_inference_guide/genai-guide>
18-
Run LLMs with Base OpenVINO <llm_inference_guide/llm-inference-native-ov>
15+
Run LLMs on OpenVINO GenAI Flavor <llm_inference_guide/genai-guide>
16+
Run LLMs on Base OpenVINO <llm_inference_guide/llm-inference-native-ov>
1917
OpenVINO Tokenizers <llm_inference_guide/ov-tokenizers>
2018

2119
Large Language Models (LLMs) like GPT are transformative deep learning networks capable of a

docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Run LLMs with OpenVINO GenAI Flavor
2-
=====================================
1+
Run LLM Inference on OpenVINO with the GenAI Flavor
2+
===============================================================================================
33

44
.. meta::
55
:description: Learn how to use the OpenVINO GenAI flavor to execute LLM models.

docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst

+5-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1-
.. {#llm_inference}
2-
31
Run LLMs with Hugging Face and Optimum Intel
4-
=====================================================
2+
===============================================================================================
3+
4+
.. meta::
5+
:description: Learn how to use the native OpenVINO package to execute LLM models.
6+
57

68
The steps below show how to load and infer LLMs from
79
`Hugging Face <https://huggingface.co/models>`__ using Optimum Intel. They also show how to

docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-native-ov.rst

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
1-
.. {#llm_inference_native_ov}
2-
3-
Run LLMs with Base OpenVINO
4-
===============================
1+
Run LLM Inference on Native OpenVINO (not recommended)
2+
===============================================================================================
53

64
To run Generative AI models using native OpenVINO APIs you need to follow regular
75
**Convert -> Optimize -> Deploy** path with a few simplifications.

docs/articles_en/learn-openvino/llm_inference_guide/ov-tokenizers.rst

+5-7
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
1-
.. {#tokenizers}
2-
31
OpenVINO Tokenizers
42
===============================
53

6-
Tokenization is a necessary step in text processing using various models, including text generation with LLMs.
7-
Tokenizers convert the input text into a sequence of tokens with corresponding IDs, so that
8-
the model can understand and process it during inference. The transformation of a sequence of numbers into a
9-
string is calleddetokenization.
4+
Tokenization is a necessary step in text processing using various models, including text
5+
generation with LLMs. Tokenizers convert the input text into a sequence of tokens with
6+
corresponding IDs, so that the model can understand and process it during inference. The
7+
transformation of a sequence of numbers into a string is called detokenization.
108

119
.. image:: ../../assets/images/tokenization.svg
1210
:align: center
@@ -338,7 +336,7 @@ Additional Resources
338336

339337
* `OpenVINO Tokenizers repo <https://github.com/openvinotoolkit/openvino_tokenizers>`__
340338
* `OpenVINO Tokenizers Notebook <https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/openvino-tokenizers>`__
341-
* `Text generation C++ samples that support most popular models like LLaMA 2 <https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/cpp/greedy_causal_lm>`__
339+
* `Text generation C++ samples that support most popular models like LLaMA 2 <https://github.com/openvinotoolkit/openvino.genai/tree/master/text_generation/causal_lm/cpp>`__
342340
* `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai>`__
343341

344342

docs/articles_en/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.rst

+11-7
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
.. {#openvino_docs_OV_UG_Hetero_execution}
2-
3-
Heterogeneous execution
1+
Heterogeneous Execution
42
=======================
53

64

@@ -9,22 +7,28 @@ Heterogeneous execution
97
the inference of one model on several computing devices.
108

119

12-
Heterogeneous execution enables executing inference of one model on several devices. Its purpose is to:
10+
Heterogeneous execution enables executing inference of one model on several devices.
11+
Its purpose is to:
1312

14-
* Utilize the power of accelerators to process the heaviest parts of the model and to execute unsupported operations on fallback devices, like the CPU.
13+
* Utilize the power of accelerators to process the heaviest parts of the model and to execute
14+
unsupported operations on fallback devices, like the CPU.
1515
* Utilize all available hardware more efficiently during one inference.
1616

1717
Execution via the heterogeneous mode can be divided into two independent steps:
1818

1919
1. Setting hardware affinity to operations (`ov::Core::query_model <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html#doxid-classov-1-1-core-1acdf8e64824fe4cf147c3b52ab32c1aab>`__ is used internally by the Hetero device).
2020
2. Compiling a model to the Heterogeneous device assumes splitting the model to parts, compiling them on the specified devices (via `ov::device::priorities <https://docs.openvino.ai/2024/api/c_cpp_api/structov_1_1device_1_1_priorities.html>`__), and executing them in the Heterogeneous mode. The model is split to subgraphs in accordance with the affinities, where a set of connected operations with the same affinity is to be a dedicated subgraph. Each subgraph is compiled on a dedicated device and multiple `ov::CompiledModel <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_compiled_model.html#doxid-classov-1-1-compiled-model>`__ objects are made, which are connected via automatically allocated intermediate tensors.
2121

22+
If you set pipeline parallelism (via ``ov::hint::model_distribution_policy``), the model is split into multiple stages, and each stage is assigned to a different device. The output of one stage is fed as input to the next stage.
23+
2224
These two steps are not interconnected and affinities can be set in one of two ways, used separately or in combination (as described below): in the ``manual`` or the ``automatic`` mode.
2325

24-
Defining and Configuring the Hetero Device
26+
Defining and configuring the Hetero device
2527
##########################################
2628

27-
Following the OpenVINO™ naming convention, the Hetero execution plugin is assigned the label of ``"HETERO".`` It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:
29+
Following the OpenVINO™ naming convention, the Hetero execution plugin is assigned the label of
30+
``"HETERO".`` It may be defined with no additional parameters, resulting in defaults being used,
31+
or configured further with the following setup options:
2832

2933

3034
+-------------------------------+--------------------------------------------+-----------------------------------------------------------+

docs/articles_en/openvino-workflow/running-inference/optimize-inference/optimize-preprocessing/torchvision-preprocessing-converter.rst

+8-9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
.. {#torchvision_preprocessing_converter}
2-
3-
Torchvision preprocessing converter
1+
Torchvision Preprocessing Converter
42
=======================================
53

64

@@ -9,13 +7,14 @@ Torchvision preprocessing converter
97
to optimize model inference.
108

119

12-
The Torchvision-to-OpenVINO converter enables automatic translation of operators from the torchvision
13-
preprocessing pipeline to the OpenVINO format and embed them in your model. It is often used to adjust
14-
images serving as input for AI models to have proper dimensions or data types.
10+
The Torchvision-to-OpenVINO converter enables automatic translation of operators from the
11+
torchvision preprocessing pipeline to the OpenVINO format and embed them in your model. It is
12+
often used to adjust images serving as input for AI models to have proper dimensions or data
13+
types.
1514

16-
As the converter is fully based on the **openvino.preprocess** module, you can implement the **torchvision.transforms**
17-
feature easily and without the use of external libraries, reducing the overall application complexity
18-
and enabling additional performance optimizations.
15+
As the converter is fully based on the **openvino.preprocess** module, you can implement the
16+
**torchvision.transforms** feature easily and without the use of external libraries, reducing
17+
the overall application complexity and enabling additional performance optimizations.
1918

2019

2120
.. note::
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,100 @@
11
Product,Model,Framework,Precision,Node,Request Rate,Throughput [tok/s],TPOT Mean Latency
2-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,inf,270.55,839.37
3-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,2.0,269.6,847.81
4-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,1.0,268.92,840.1
52
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.2,92.75,75.75
6-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,inf,702.42,307.82
7-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,2.0,680.45,302.09
8-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.0,442.69,169.24
3+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.3,137.89,98.6
4+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.4,182.68,144.36
5+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.5,227.02,238.54
6+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.6,259.06,679.07
7+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.7,267.24,785.75
8+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.8,267.77,815.11
9+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.9,270.01,827.09
10+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,1.0,268.92,840.1
11+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,2.0,269.6,847.81
12+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,inf,270.55,839.37
913
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.2,92.63,63.23
10-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,inf,701.91,305.9
11-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,2.0,684.4,299.41
12-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.0,442.46,170.65
14+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.4,183.51,105.0
15+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.6,272.59,95.34
16+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.8,359.28,126.61
17+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.0,442.69,169.24
18+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.2,521.61,195.94
19+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.4,589.34,267.43
20+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.6,650.25,291.68
21+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.8,655.39,308.64
22+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,2.0,680.45,302.09
23+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,inf,702.42,307.82
1324
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.2,92.89,54.69
14-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,inf,290.39,793.52
15-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,2.0,284.14,815.09
16-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,1.0,290.67,783.47
17-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.2,87.18,74.96
18-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,inf,873.93,245.31
19-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,2.0,774.3,233.49
20-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.0,427.37,114.16
21-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.2,88.9,60.04
22-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,inf,839.74,253.74
23-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,2.0,771.17,232.08
24-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.0,427.85,128.33
25-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.2,88.92,56.33
26-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,inf,275.71,810.89
27-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,2.0,278.07,809.3
28-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,1.0,272.54,811.37
25+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.4,184.37,77.0
26+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.6,273.06,101.81
27+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.8,360.22,135.38
28+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.0,442.46,170.65
29+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.2,519.5,208.44
30+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.4,590.11,252.86
31+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.6,651.09,286.93
32+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.8,670.74,298.02
33+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,2.0,684.4,299.41
34+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,inf,701.91,305.9
2935
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.2,79.24,73.06
30-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,inf,799.46,265.5
31-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,2.0,707.46,234.44
32-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.0,380.61,104.71
36+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.3,118.42,90.31
37+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.4,157.04,113.23
38+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.5,193.85,203.97
39+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.6,232.36,253.17
40+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.7,260.56,581.45
41+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.8,271.97,761.05
42+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.9,273.36,787.74
43+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,1.0,272.54,811.37
44+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,2.0,278.07,809.3
45+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,inf,275.71,810.89
3346
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.2,78.3,60.37
34-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,inf,843.51,252.12
35-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,2.0,698.38,247.77
36-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.0,376.36,139.62
47+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.4,156.42,69.27
48+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.6,232.27,77.79
49+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.8,307.37,90.07
50+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.0,380.61,104.71
51+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.2,452.18,127.36
52+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.4,519.44,156.18
53+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.6,587.62,169.44
54+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.8,649.94,198.44
55+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,2.0,707.46,234.44
56+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,inf,799.46,265.5
3757
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.2,78.61,54.12
58+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.4,156.19,70.38
59+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.6,232.36,81.83
60+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.8,307.01,101.66
61+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.0,376.36,139.62
62+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.2,447.75,158.53
63+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.4,519.74,160.26
64+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.6,582.37,190.22
65+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.8,635.46,231.31
66+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,2.0,698.38,247.77
67+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,inf,843.51,252.12
68+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.2,87.18,74.96
69+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.3,130.74,92.67
70+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.4,172.94,117.03
71+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.5,214.71,172.69
72+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.6,255.45,282.74
73+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.7,280.38,629.68
74+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.8,280.55,765.16
75+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.9,289.65,765.65
76+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,1.0,290.67,783.47
77+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,2.0,284.14,815.09
78+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,inf,290.39,793.52
79+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.2,88.9,60.04
80+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.4,176.5,70.24
81+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.6,262.04,77.01
82+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.8,346.01,95.29
83+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.0,427.37,114.16
84+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.2,507.86,138.56
85+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.4,582.58,150.72
86+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.6,655.61,166.64
87+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.8,717.9,216.76
88+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,2.0,774.3,233.49
89+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,inf,873.93,245.31
90+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.2,88.92,56.33
91+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.4,175.99,72.72
92+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.6,261.96,84.24
93+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.8,346.78,101.67
94+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.0,427.85,128.33
95+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.2,506.17,150.01
96+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.4,581.72,167.61
97+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.6,651.97,190.91
98+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.8,713.2,222.56
99+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,2.0,771.17,232.08
100+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,inf,839.74,253.74

0 commit comments

Comments
 (0)