Skip to content

Commit 5264c99

Browse files
[DOCS] tiny article name changes (openvinotoolkit#25910)
1 parent 3cf2744 commit 5264c99

File tree

9 files changed

+131
-71
lines changed

9 files changed

+131
-71
lines changed

docs/articles_en/get-started/install-openvino/install-openvino-docker-linux.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
Install Intel® Distribution of OpenVINO™ toolkit from a Docker Image
1+
Install Intel® Distribution of OpenVINO™ Toolkit From a Docker Image
22
=======================================================================
33

44
.. meta::
55
:description: Learn how to use a prebuilt Docker image or create an image
66
manually to install OpenVINO™ Runtime on Linux and Windows operating systems.
77

8-
This guide presents information on how to use a pre-built Docker image/create an image manually to install OpenVINO™ Runtime.
9-
10-
Supported host operating systems for the Docker Base image:
8+
This guide presents information on how to use a pre-built Docker image or create a new image
9+
manually, to install OpenVINO™ Runtime. The supported host operating systems for the Docker
10+
base image are:
1111

1212
- Linux
1313
- Windows (WSL2)

docs/articles_en/learn-openvino/llm_inference_guide.rst

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
.. {#native_vs_hugging_face_api}
2-
31
Large Language Model Inference Guide
42
========================================
53

@@ -14,8 +12,8 @@ Large Language Model Inference Guide
1412
:hidden:
1513

1614
Run LLMs with Optimum Intel <llm_inference_guide/llm-inference-hf>
17-
Run LLMs with OpenVINO GenAI Flavor <llm_inference_guide/genai-guide>
18-
Run LLMs with Base OpenVINO <llm_inference_guide/llm-inference-native-ov>
15+
Run LLMs on OpenVINO GenAI Flavor <llm_inference_guide/genai-guide>
16+
Run LLMs on Base OpenVINO <llm_inference_guide/llm-inference-native-ov>
1917
OpenVINO Tokenizers <llm_inference_guide/ov-tokenizers>
2018

2119
Large Language Models (LLMs) like GPT are transformative deep learning networks capable of a

docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Run LLMs with OpenVINO GenAI Flavor
2-
=====================================
1+
Run LLM Inference on OpenVINO with the GenAI Flavor
2+
===============================================================================================
33

44
.. meta::
55
:description: Learn how to use the OpenVINO GenAI flavor to execute LLM models.

docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst

+5-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1-
.. {#llm_inference}
2-
31
Run LLMs with Hugging Face and Optimum Intel
4-
=====================================================
2+
===============================================================================================
3+
4+
.. meta::
5+
:description: Learn how to use the native OpenVINO package to execute LLM models.
6+
57

68
The steps below show how to load and infer LLMs from
79
`Hugging Face <https://huggingface.co/models>`__ using Optimum Intel. They also show how to

docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-native-ov.rst

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
1-
.. {#llm_inference_native_ov}
2-
3-
Run LLMs with Base OpenVINO
4-
===============================
1+
Run LLM Inference on Native OpenVINO (not recommended)
2+
===============================================================================================
53

64
To run Generative AI models using native OpenVINO APIs you need to follow regular
75
**Convert -> Optimize -> Deploy** path with a few simplifications.

docs/articles_en/learn-openvino/llm_inference_guide/ov-tokenizers.rst

+5-7
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
1-
.. {#tokenizers}
2-
31
OpenVINO Tokenizers
42
===============================
53

6-
Tokenization is a necessary step in text processing using various models, including text generation with LLMs.
7-
Tokenizers convert the input text into a sequence of tokens with corresponding IDs, so that
8-
the model can understand and process it during inference. The transformation of a sequence of numbers into a
9-
string is calleddetokenization.
4+
Tokenization is a necessary step in text processing using various models, including text
5+
generation with LLMs. Tokenizers convert the input text into a sequence of tokens with
6+
corresponding IDs, so that the model can understand and process it during inference. The
7+
transformation of a sequence of numbers into a string is called detokenization.
108

119
.. image:: ../../assets/images/tokenization.svg
1210
:align: center
@@ -338,7 +336,7 @@ Additional Resources
338336

339337
* `OpenVINO Tokenizers repo <https://github.com/openvinotoolkit/openvino_tokenizers>`__
340338
* `OpenVINO Tokenizers Notebook <https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/openvino-tokenizers>`__
341-
* `Text generation C++ samples that support most popular models like LLaMA 2 <https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/cpp/greedy_causal_lm>`__
339+
* `Text generation C++ samples that support most popular models like LLaMA 2 <https://github.com/openvinotoolkit/openvino.genai/tree/master/text_generation/causal_lm/cpp>`__
342340
* `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai>`__
343341

344342

docs/articles_en/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.rst

+10-8
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
.. {#openvino_docs_OV_UG_Hetero_execution}
2-
3-
Heterogeneous execution
1+
Heterogeneous Execution
42
=======================
53

64

@@ -9,24 +7,28 @@ Heterogeneous execution
97
the inference of one model on several computing devices.
108

119

12-
Heterogeneous execution enables executing inference of one model on several devices. Its purpose is to:
10+
Heterogeneous execution enables executing inference of one model on several devices.
11+
Its purpose is to:
1312

14-
* Utilize the power of accelerators to process the heaviest parts of the model and to execute unsupported operations on fallback devices, like the CPU.
13+
* Utilize the power of accelerators to process the heaviest parts of the model and to execute
14+
unsupported operations on fallback devices, like the CPU.
1515
* Utilize all available hardware more efficiently during one inference.
1616

1717
Execution via the heterogeneous mode can be divided into two independent steps:
1818

1919
1. Setting hardware affinity to operations (`ov::Core::query_model <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html#doxid-classov-1-1-core-1acdf8e64824fe4cf147c3b52ab32c1aab>`__ is used internally by the Hetero device).
2020
2. Compiling a model to the Heterogeneous device assumes splitting the model to parts, compiling them on the specified devices (via `ov::device::priorities <https://docs.openvino.ai/2024/api/c_cpp_api/structov_1_1device_1_1_priorities.html>`__), and executing them in the Heterogeneous mode. The model is split to subgraphs in accordance with the affinities, where a set of connected operations with the same affinity is to be a dedicated subgraph. Each subgraph is compiled on a dedicated device and multiple `ov::CompiledModel <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_compiled_model.html#doxid-classov-1-1-compiled-model>`__ objects are made, which are connected via automatically allocated intermediate tensors.
21-
21+
2222
If you set pipeline parallelism (via ``ov::hint::model_distribution_policy``), the model is split into multiple stages, and each stage is assigned to a different device. The output of one stage is fed as input to the next stage.
2323

2424
These two steps are not interconnected and affinities can be set in one of two ways, used separately or in combination (as described below): in the ``manual`` or the ``automatic`` mode.
2525

26-
Defining and Configuring the Hetero Device
26+
Defining and configuring the Hetero device
2727
##########################################
2828

29-
Following the OpenVINO™ naming convention, the Hetero execution plugin is assigned the label of ``"HETERO".`` It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:
29+
Following the OpenVINO™ naming convention, the Hetero execution plugin is assigned the label of
30+
``"HETERO".`` It may be defined with no additional parameters, resulting in defaults being used,
31+
or configured further with the following setup options:
3032

3133

3234
+--------------------------------------------+-------------------------------------------------------------+-----------------------------------------------------------+

docs/articles_en/openvino-workflow/running-inference/optimize-inference/optimize-preprocessing/torchvision-preprocessing-converter.rst

+8-9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
.. {#torchvision_preprocessing_converter}
2-
3-
Torchvision preprocessing converter
1+
Torchvision Preprocessing Converter
42
=======================================
53

64

@@ -9,13 +7,14 @@ Torchvision preprocessing converter
97
to optimize model inference.
108

119

12-
The Torchvision-to-OpenVINO converter enables automatic translation of operators from the torchvision
13-
preprocessing pipeline to the OpenVINO format and embed them in your model. It is often used to adjust
14-
images serving as input for AI models to have proper dimensions or data types.
10+
The Torchvision-to-OpenVINO converter enables automatic translation of operators from the
11+
torchvision preprocessing pipeline to the OpenVINO format and embed them in your model. It is
12+
often used to adjust images serving as input for AI models to have proper dimensions or data
13+
types.
1514

16-
As the converter is fully based on the **openvino.preprocess** module, you can implement the **torchvision.transforms**
17-
feature easily and without the use of external libraries, reducing the overall application complexity
18-
and enabling additional performance optimizations.
15+
As the converter is fully based on the **openvino.preprocess** module, you can implement the
16+
**torchvision.transforms** feature easily and without the use of external libraries, reducing
17+
the overall application complexity and enabling additional performance optimizations.
1918

2019

2120
.. note::
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,100 @@
11
Product,Model,Framework,Precision,Node,Request Rate,Throughput [tok/s],TPOT Mean Latency
2-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,inf,270.55,839.37
3-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,2.0,269.6,847.81
4-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,1.0,268.92,840.1
52
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.2,92.75,75.75
6-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,inf,702.42,307.82
7-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,2.0,680.45,302.09
8-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.0,442.69,169.24
3+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.3,137.89,98.6
4+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.4,182.68,144.36
5+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.5,227.02,238.54
6+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.6,259.06,679.07
7+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.7,267.24,785.75
8+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.8,267.77,815.11
9+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.9,270.01,827.09
10+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,1.0,268.92,840.1
11+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,2.0,269.6,847.81
12+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,inf,270.55,839.37
913
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.2,92.63,63.23
10-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,inf,701.91,305.9
11-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,2.0,684.4,299.41
12-
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.0,442.46,170.65
14+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.4,183.51,105.0
15+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.6,272.59,95.34
16+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.8,359.28,126.61
17+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.0,442.69,169.24
18+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.2,521.61,195.94
19+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.4,589.34,267.43
20+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.6,650.25,291.68
21+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.8,655.39,308.64
22+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,2.0,680.45,302.09
23+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,inf,702.42,307.82
1324
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.2,92.89,54.69
14-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,inf,290.39,793.52
15-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,2.0,284.14,815.09
16-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,1.0,290.67,783.47
17-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.2,87.18,74.96
18-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,inf,873.93,245.31
19-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,2.0,774.3,233.49
20-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.0,427.37,114.16
21-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.2,88.9,60.04
22-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,inf,839.74,253.74
23-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,2.0,771.17,232.08
24-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.0,427.85,128.33
25-
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.2,88.92,56.33
26-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,inf,275.71,810.89
27-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,2.0,278.07,809.3
28-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,1.0,272.54,811.37
25+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.4,184.37,77.0
26+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.6,273.06,101.81
27+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.8,360.22,135.38
28+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.0,442.46,170.65
29+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.2,519.5,208.44
30+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.4,590.11,252.86
31+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.6,651.09,286.93
32+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.8,670.74,298.02
33+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,2.0,684.4,299.41
34+
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,inf,701.91,305.9
2935
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.2,79.24,73.06
30-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,inf,799.46,265.5
31-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,2.0,707.46,234.44
32-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.0,380.61,104.71
36+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.3,118.42,90.31
37+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.4,157.04,113.23
38+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.5,193.85,203.97
39+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.6,232.36,253.17
40+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.7,260.56,581.45
41+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.8,271.97,761.05
42+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.9,273.36,787.74
43+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,1.0,272.54,811.37
44+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,2.0,278.07,809.3
45+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,inf,275.71,810.89
3346
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.2,78.3,60.37
34-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,inf,843.51,252.12
35-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,2.0,698.38,247.77
36-
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.0,376.36,139.62
47+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.4,156.42,69.27
48+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.6,232.27,77.79
49+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.8,307.37,90.07
50+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.0,380.61,104.71
51+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.2,452.18,127.36
52+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.4,519.44,156.18
53+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.6,587.62,169.44
54+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.8,649.94,198.44
55+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,2.0,707.46,234.44
56+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,inf,799.46,265.5
3757
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.2,78.61,54.12
58+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.4,156.19,70.38
59+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.6,232.36,81.83
60+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.8,307.01,101.66
61+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.0,376.36,139.62
62+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.2,447.75,158.53
63+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.4,519.74,160.26
64+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.6,582.37,190.22
65+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.8,635.46,231.31
66+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,2.0,698.38,247.77
67+
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,inf,843.51,252.12
68+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.2,87.18,74.96
69+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.3,130.74,92.67
70+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.4,172.94,117.03
71+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.5,214.71,172.69
72+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.6,255.45,282.74
73+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.7,280.38,629.68
74+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.8,280.55,765.16
75+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.9,289.65,765.65
76+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,1.0,290.67,783.47
77+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,2.0,284.14,815.09
78+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,inf,290.39,793.52
79+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.2,88.9,60.04
80+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.4,176.5,70.24
81+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.6,262.04,77.01
82+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.8,346.01,95.29
83+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.0,427.37,114.16
84+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.2,507.86,138.56
85+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.4,582.58,150.72
86+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.6,655.61,166.64
87+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.8,717.9,216.76
88+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,2.0,774.3,233.49
89+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,inf,873.93,245.31
90+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.2,88.92,56.33
91+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.4,175.99,72.72
92+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.6,261.96,84.24
93+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.8,346.78,101.67
94+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.0,427.85,128.33
95+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.2,506.17,150.01
96+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.4,581.72,167.61
97+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.6,651.97,190.91
98+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.8,713.2,222.56
99+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,2.0,771.17,232.08
100+
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,inf,839.74,253.74

0 commit comments

Comments
 (0)