openvinotoolkit · kblaszczak-intel · Feb 26, 2025 · Feb 18, 2025 · Feb 25, 2025
@@ -84,7 +84,7 @@ function(build_docs)
         list(APPEND commands COMMAND ${Python3_EXECUTABLE} ${FILE_HELPER_SCRIPT}
         --filetype=md
         --input_dir=${OVMS_DOCS_DIR}
-        --output_dir=${SPHINX_SOURCE_DIR}/openvino-workflow/model-server
+        --output_dir=${SPHINX_SOURCE_DIR}/model-server
         --exclude_dir=${SPHINX_SOURCE_DIR})
         list(APPEND commands COMMAND ${CMAKE_COMMAND} -E cmake_echo_color --green "FINISHED preprocessing OVMS")
     endif()

@@ -38,7 +38,7 @@ and TensorFlow models during training.
 
 | **OpenVINO Model Server**
 | :bdg-link-dark:`GitHub <https://github.com/openvinotoolkit/model_server>`
-  :bdg-link-success:`User Guide <https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_what_is_openvino_model_server.html>`
+  :bdg-link-success:`User Guide <https://docs.openvino.ai/2025/model-server/ovms_what_is_openvino_model_server.html>`
 
 A high-performance system that can be used to access the host models via request to the model
 server.

@@ -17,7 +17,7 @@ In this release, one person performs the role of both the Model Developer and th
 Overview
 ########
 
-The OpenVINO™ Security Add-on works with the :doc:`OpenVINO™ Model Server <../../../openvino-workflow/model-server/ovms_what_is_openvino_model_server>` on Intel® architecture. Together, the OpenVINO™ Security Add-on and the OpenVINO™ Model Server provide a way for Model Developers and Independent Software Vendors to use secure packaging and secure model execution to enable access control to the OpenVINO™ models, and for model Users to run inference within assigned limits.
+The OpenVINO™ Security Add-on works with the :doc:`OpenVINO™ Model Server <../../../../model-server/ovms_what_is_openvino_model_server>` on Intel® architecture. Together, the OpenVINO™ Security Add-on and the OpenVINO™ Model Server provide a way for Model Developers and Independent Software Vendors to use secure packaging and secure model execution to enable access control to the OpenVINO™ models, and for model Users to run inference within assigned limits.
 
 The OpenVINO™ Security Add-on consists of three components that run in Kernel-based Virtual Machines (KVMs). These components provide a way to run security-sensitive operations in an isolated environment. A brief description of the three components are as follows. Click each triangled line for more information about each.
 

@@ -18,7 +18,7 @@ Performance Benchmarks
 
 This page presents benchmark results for the
 `Intel® Distribution of OpenVINO™ toolkit <https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html>`__
-and :doc:`OpenVINO Model Server <../openvino-workflow/model-server/ovms_what_is_openvino_model_server>`, for a representative
+and :doc:`OpenVINO Model Server <../../model-server/ovms_what_is_openvino_model_server>`, for a representative
 selection of public neural networks and Intel® devices. The results may help you decide which
 hardware to use in your applications or plan AI workload for the hardware you have already
 implemented in your solutions. Click the buttons below to see the chosen benchmark data.

@@ -55,14 +55,22 @@ options:
       as well as conversion on the fly. For integration with the final product it may offer
       lower performance, though.
 
-   .. tab-item:: Base OpenVINO (not recommended)
+   .. tab-item:: OpenVINO™ Model Server
 
-      Note that the base version of OpenVINO may also be used to run generative AI. Although it may
-      offer a simpler environment, with fewer dependencies, it has significant limitations and a more
-      demanding implementation process.
+      | - Easy and quick deployment of models to edge or cloud.
+      | - Includes endpoints for serving generative AI models.
+      | - Available in both Python and C++.
+      | - Allows client applications in any programming language that supports REST or gRPC.
 
-      To learn more, refer to the article for the 2024.6 OpenVINO version:
-      `Generative AI with Base OpenVINO <https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-native-ov.html>`__
+      :doc:`OpenVINO™ Model Server <model-server/ovms_what_is_openvino_model_server>`
+      provides a set of REST API endpoints dedicated to generative use cases. The endpoints
+      simplify writing AI applications, ensure scalability, and provide state-of-the-art
+      performance optimizations. They include OpenAI API for:
+      `text generation <https://openvino-doc.iotg.sclab.intel.com/seba-test-8/model-server/ovms_docs_rest_api_chat.html>`__,
+      `embeddings <https://openvino-doc.iotg.sclab.intel.com/seba-test-8/model-server/ovms_docs_rest_api_embeddings.html>`__,
+      and `reranking <https://openvino-doc.iotg.sclab.intel.com/seba-test-8/model-server/ovms_docs_rest_api_rerank.html>`__.
+      The model server supports deployments as containers or binary applications on Linux and Windows with CPU or GPU acceleration.
+      See the :doc:`demos <model-server/ovms_docs_demos>`.
 
 
 
@@ -94,10 +102,13 @@ The advantages of using OpenVINO for generative model deployment:
      better performance than Python-based runtimes.
 
 
+You can run Generative AI models, using native OpenVINO API, although it is not recommended.
+If you want to learn how to do it, refer to
+`the 24.6 documentation <https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-native-ov.html>`__.
+
+
 Proceed to guides on:
 
 * :doc:`OpenVINO GenAI <./openvino-workflow-generative/inference-with-genai>`
 * :doc:`Hugging Face and Optimum Intel <./openvino-workflow-generative/inference-with-optimum-intel>`
 * `Generative AI with Base OpenVINO <https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-native-ov.html>`__
-
-
@@ -13,8 +13,7 @@ OpenVINO Workflow
    Model Preparation <openvino-workflow/model-preparation>
    openvino-workflow/model-optimization
    Running Inference <openvino-workflow/running-inference>
-   Deployment on a Local System  <openvino-workflow/deployment-locally>
-   Deployment on a Model Server <openvino-workflow/model-server/ovms_what_is_openvino_model_server>
+   Deployment on a Local System <openvino-workflow/deployment-locally>
    openvino-workflow/torch-compile
 
 
@@ -86,11 +85,11 @@ OpenVINO uses the following functions for reading, converting, and saving models
      and the quickest way of running a deep learning model.
 
 | :doc:`Deployment Option 1. Using OpenVINO Runtime <openvino-workflow/deployment-locally>`
-|    Deploy a model locally, reading the file directly from your application and utilizing about-openvino/additional-resources available to the system.
+|    Deploy a model locally, reading the file directly from your application and utilizing resources available to the system.
 |    Deployment on a local system uses the steps described in the section on running inference.
 
-| :doc:`Deployment Option 2. Using Model Server <openvino-workflow/model-server/ovms_what_is_openvino_model_server>`
-|    Deploy a model remotely, connecting your application to an inference server and utilizing external about-openvino/additional-resources, with no impact on the app's performance.
+| :doc:`Deployment Option 2. Using Model Server <../model-server/ovms_what_is_openvino_model_server>`
+|    Deploy a model remotely, connecting your application to an inference server and utilizing external resources, with no impact on the app's performance.
 |    Deployment on OpenVINO Model Server is quick and does not require any additional steps described in the section on running inference.
 
 | :doc:`Deployment Option 3. Using torch.compile for PyTorch 2.0  <openvino-workflow/torch-compile>`

@@ -140,4 +140,4 @@ sequences.
 You can find more examples demonstrating how to work with states in other articles:
 
 * `LLaVA-NeXT Multimodal Chatbot notebook <../../notebooks/llava-next-multimodal-chatbot-with-output.html>`__
-* :doc:`Serving Stateful Models with OpenVINO Model Server <../../openvino-workflow/model-server/ovms_docs_stateful_models>`
+* :doc:`Serving Stateful Models with OpenVINO Model Server <../../model-server/ovms_docs_stateful_models>`
@@ -38,7 +38,7 @@ hardware and environments, on-premises and on-device, in the browser or in the c
                   <li id="ov-homepage-slide3" class="splide__slide">
                   <p class="ov-homepage-slide-title">Improved model serving</p>
                   <p class="ov-homepage-slide-subtitle">OpenVINO Model Server has improved parallel inference!</p>
-                  <a class="ov-homepage-banner-btn" href="https://docs.openvino.ai/2025/openvino-workflow/model-server/ovms_what_is_openvino_model_server.html">Learn more</a>
+                  <a class="ov-homepage-banner-btn" href="https://docs.openvino.ai/2025/model-server/ovms_what_is_openvino_model_server.html">Learn more</a>
                   </li>
                   <li id="ov-homepage-slide4" class="splide__slide">
                   <p class="ov-homepage-slide-title">OpenVINO via PyTorch 2.0 torch.compile()</p>
@@ -124,7 +124,7 @@ Places to Begin
 
       Cloud-ready deployments for microservice applications.
 
-      .. button-link:: openvino-workflow/model-server/ovms_what_is_openvino_model_server.html
+      .. button-link:: model-server/ovms_what_is_openvino_model_server.html
          :color: primary
          :outline:
 
@@ -195,5 +195,6 @@ Key Features
    GET STARTED <get-started>
    HOW TO USE - MAIN WORKFLOW <openvino-workflow>
    HOW TO USE - GENERATIVE AI WORKFLOW <openvino-workflow-generative>
+   HOW TO USE - MODEL SERVING <model-server/ovms_what_is_openvino_model_server>
    REFERENCE DOCUMENTATION <documentation>
    ABOUT OPENVINO <about-openvino>