opea-project
diff --git a/‎tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.rst
+180 b/‎tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.rst
+180
diff --git a/‎tutorial/OpenTelemetry/assets/Grafana_Node_Exporter.png
152 KB b/‎tutorial/OpenTelemetry/assets/Grafana_Node_Exporter.png
152 KB
diff --git a/‎tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server.png
133 KB b/‎tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server.png
133 KB
diff --git a/‎tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server_1.png
133 KB b/‎tutorial/OpenTelemetry/assets/Grafana_chatqna_backend_server_1.png
133 KB
diff --git a/‎tutorial/OpenTelemetry/assets/Grafana_chatqna_dataprep.png
159 KB b/‎tutorial/OpenTelemetry/assets/Grafana_chatqna_dataprep.png
159 KB
diff --git a/‎tutorial/OpenTelemetry/assets/Grafana_chatqna_retriever.png
135 KB b/‎tutorial/OpenTelemetry/assets/Grafana_chatqna_retriever.png
135 KB
diff --git a/‎tutorial/OpenTelemetry/assets/Grafana_vLLM.png
115 KB b/‎tutorial/OpenTelemetry/assets/Grafana_vLLM.png
115 KB
diff --git a/‎tutorial/OpenTelemetry/assets/Grafana_vLLM_2.png
29.1 KB b/‎tutorial/OpenTelemetry/assets/Grafana_vLLM_2.png
29.1 KB
diff --git a/‎tutorial/OpenTelemetry/assets/chatqna_16reqs.png
31.9 KB b/‎tutorial/OpenTelemetry/assets/chatqna_16reqs.png
31.9 KB
diff --git a/‎tutorial/OpenTelemetry/assets/grafana_dashboard_init.png
75.8 KB b/‎tutorial/OpenTelemetry/assets/grafana_dashboard_init.png
75.8 KB
diff --git a/‎tutorial/OpenTelemetry/assets/grafana_init.png
98.5 KB b/‎tutorial/OpenTelemetry/assets/grafana_init.png
98.5 KB
diff --git a/‎tutorial/OpenTelemetry/assets/jaeger_ui_init.png
31.4 KB b/‎tutorial/OpenTelemetry/assets/jaeger_ui_init.png
31.4 KB
diff --git a/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea.png
36.5 KB b/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea.png
36.5 KB
diff --git a/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_1req.png
76.3 KB b/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_1req.png
76.3 KB
diff --git a/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_cpu_breakdown.png
69.2 KB b/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_cpu_breakdown.png
69.2 KB
diff --git a/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown.png
39.3 KB b/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown.png
39.3 KB
diff --git a/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown_2.png
40.9 KB b/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_breakdown_2.png
40.9 KB
diff --git a/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_cpu.png
80.8 KB b/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_cpu.png
80.8 KB
diff --git a/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_gaudi.png
47.8 KB b/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_chatqna_req_gaudi.png
47.8 KB
diff --git a/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_trace.png
31.4 KB b/‎tutorial/OpenTelemetry/assets/jaeger_ui_opea_trace.png
31.4 KB
diff --git a/‎tutorial/OpenTelemetry/assets/opea_telemetry.jpg
139 KB b/‎tutorial/OpenTelemetry/assets/opea_telemetry.jpg
139 KB
diff --git a/‎tutorial/OpenTelemetry/assets/prometheus.png
165 KB b/‎tutorial/OpenTelemetry/assets/prometheus.png
165 KB
diff --git a/‎tutorial/OpenTelemetry/deploy/ChatQnA.md
+107 b/‎tutorial/OpenTelemetry/deploy/ChatQnA.md
+107
diff --git a/‎tutorial/index.rst
+1 b/‎tutorial/index.rst
+1
@@ -0,0 +1,180 @@
+.. _OpenTelemetry_OPEA_Guide:
+
+OpenTelemetry on OPEA Guide
+#############################
+
+Overview
+********
+OpenTelemetry (also referred to as OTel) is an open source observability framework made up of a collection of tools, APIs, and SDKs.  
+OTel enables developers to instrument, generate, collect, and export telemetry data for analysis and to understand software performance and behavior.  
+The telemetry data can come in the form of traces, metrics, and logs.  
+OPEA integrates OpenTelemetry's metrics and tracing capabilities to enhance its telemetry support, providing users with valuable insights into system performance.  
+
+
+How It Works
+************
+OPEA Comps offers telemetry functionalities for metrics and tracing by integrating with tools such as Prometheus, Grafana, and Jaeger. Below is a brief introduction to the workflows of those tools:
+
+.. image:: assets/opea_telemetry.jpg
+  :width: 800
+  :alt: Alternative text
+
+
+
+The majority of OPEA's micro and mega services are equipped to support OpenTelemetry metrics, which are exported in Prometheus format via the /metrics endpoint.  
+For further guidance, please refer to the section on `Telemetry Metrics <https://github.com/opea-project/GenAIComps/tree/main/comps/cores/telemetry#metrics>`_.
+Prometheus plays a crucial role in collecting metrics from OPEA service endpoints, while Grafana leverages Prometheus as a data source to visualize these metrics on pre-configured dashboards.  
+
+OPEA also supports OpenTelemetry tracing, with several OPEA GenAIExamples instrumented to trace key functions such as microservice execution and LLM generations.   
+Additionally, HuggingFace's Text Embedding Inference and Text Generation Inference services are enabled for select OPEA GenAIExamples.   
+The Jaeger UI monitors trace events from OPEA microservices, TEI, and TGI. Once Jaeger endpoints are configured in OPEA microservices, TEI, and TGI, 
+trace data will automatically be reported and visualized in the Jaeger UI.  
+
+
+Deployment
+**********
+
+In the OpenTelemetry-enabled GenAIExamples, OpenTelemetry Metrics is activated by default, while OpenTelemetry Tracing is initially disabled.  
+Similarly, the Telemetry UI services, including Grafana, Prometheus, and Jaeger, are also disabled by default.  
+To enable OTel tracing along with Grafana, Prometheus, and Jaeger for an example, you can include an additional telemetry Docker Compose YAML file. 
+For instance, adding compose.telemetry.yaml alongside compose.yaml will activate all telemetry features for the example.  
+
+
+.. code-block:: bash
+
+   source ./set_env.sh
+   docker compose -f compose.yaml -f compose.telemetry.yaml up -d
+
+
+Below are the GenAIExamples that include support for Grafana, Prometheus, and Jaeger services.
+
+.. toctree::
+   :maxdepth: 1
+
+   ChatQnA <deploy/ChatQnA>
+
+How to Monitor 
+****************
+
+OpenTelemetry metrics and tracing can be visualized through three primary monitoring UI web pages.  
+
+1. Prometheus
++++++++++++++++
+
+The Prometheus UI provides insights into which services have active metrics endpoints.  
+By default, Prometheus operates on port 9090.  
+You can access the Prometheus UI web page using the following URL.  
+
+.. code-block:: bash
+
+   http://${host_ip}:9090/targets
+
+Services with accessible metrics endpoints will be marked as "up" in Prometheus. 
+If a service is marked as "down," Grafana Dashboards will be unable to display the associated metrics information.  
+
+.. image:: assets/prometheus.png
+  :width: 800
+  :alt: Alternative text
+
+2. Grafana
++++++++++++++++
+
+The Grafana UI displays telemetry metrics through pre-defined dashboards, providing a clear visualization of data. 
+For OPEA examples, Grafana is configured by default to use Prometheus as its data source, eliminating the need for manual setup. 
+The Grafana UI web page could be accessed using the following URL.
+
+.. code-block:: bash
+
+   http://${host_ip}:3000
+
+
+.. image:: assets/grafana_init.png
+  :width: 800
+  :alt: Alternative text
+
+
+To view the pre-defined dashboards, click on the "Dashboard" tab located on the left-hand side of the Grafana UI. 
+This will allow you to explore various dashboards that have been set up to visualize telemetry metrics effectively.
+
+
+.. image:: assets/grafana_dashboard_init.png
+  :width: 800
+  :alt: Alternative text
+
+Detailed explanations for understanding each dashboard are provided within the telemetry sections of the respective GenAIExamples. 
+These sections offer insights into how to interpret the data and utilize the dashboards effectively for monitoring and analysis.
+
+.. toctree::
+   :maxdepth: 1
+
+   ChatQnA <deploy/ChatQnA>
+
+
+3. Jaeger
++++++++++++++++
+
+The Jaeger UI is instrumental in understanding function tracing for each request, providing visibility into the execution flow and timing of microservices. 
+OPEA traces the execution time for each microservice and monitors key functions within them. 
+By default, Jaeger operates on port 16686. 
+The Jaeger UI web page could be accessed using the following URL.
+
+.. code-block:: bash
+
+   http://${host_ip}:16686
+
+Traces will only appear in the Jaeger UI if the relevant functions have been executed. 
+Therefore, without running the example, the UI will not display any trace data.
+
+.. image:: assets/jaeger_ui_init.png
+  :width: 400
+  :alt: Alternative text
+
+Once the example is run, refresh the Jaeger UI webpage, and the OPEA service should appear under the "Services" tab, 
+indicating that trace data is being captured and displayed.
+
+.. image:: assets/jaeger_ui_opea.png
+  :width: 400
+  :alt: Alternative text
+
+Select "opea" as the service, then click the "Find Traces" button to view the trace data associated with the service's execution.
+
+.. image:: assets/jaeger_ui_opea_trace.png
+  :width: 400
+  :alt: Alternative text
+
+
+All traces will be displayed on the UI. 
+The diagram in the upper right corner provides a visual representation of all requests along the timeline. Meanwhile, 
+the diagrams in the lower right corner illustrate all spans within each request, offering detailed insights into the execution flow and timing.
+
+.. image:: assets/jaeger_ui_opea_chatqna_1req.png
+  :width: 800
+  :alt: Alternative text
+
+Detailed explanations for understanding each Jaeger diagrams are provided within the telemetry sections of the respective GenAIExamples. 
+These sections offer insights into how to interpret the data and utilize the dashboards effectively for monitoring and analysis.
+
+.. toctree::
+   :maxdepth: 1
+
+   ChatQnA <deploy/ChatQnA>
+
+
+Code Instrumentations for OPEA Tracing
+****************************************
+
+Enabling OPEA OpenTelemetry tracing for a function is straightforward. 
+First, import opea_telemetry, and then apply the Python decorator @opea_telemetry to the function you wish to trace. 
+Below is an example of how to trace your_func using OPEA tracing:  
+
+
+.. code-block:: python
+
+   from comps import opea_telemetry
+
+   @opea_telemetry
+   async def your_func():
+      pass
+
+
+
@@ -0,0 +1,107 @@
+# OpenTelemetry on ChatQnA Application
+
+Each microservice in ChatQnA is instrumented with opea_telemetry, enabling Jaeger to provide a detailed time breakdown across microservices for each request.  
+Additionally, ChatQnA features a pre-defined Grafana dashboard for its megaservice, alongside a vLLM Grafana dashboard.  
+A dashboard for monitoring CPU statistics is also available, offering comprehensive insights into system performance and resource utilization.  
+
+# Table of contents
+
+1. [Deployment](#deployment)
+2. [Telemetry Tracing with Jaeger on Gaudi](#telemetry-tracing-with-jaeger-on-gaudi)
+3. [Telemetry Metrics with Grafana on Gaudi](#telemetry-metrics-with-grafana-on-gaudi)
+
+## Deployment
+
+### Xeon
+
+```bash
+cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
+docker compose -f compose.yaml -f compose.telemetry.yaml up -d
+```
+### Gaudi
+
+```bash
+cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
+docker compose -f compose.yaml -f compose.telemetry.yaml up -d
+```
+
+## Telemetry Tracing with Jaeger on Gaudi
+
+After ChatQnA processes a question, two traces should appear along the timeline.  
+The trace for opea: ServiceOrchestrator.schedule runs on the CPU and includes seven spans, one of which represents the LLM host functions in general.  
+For LLM functions executed on Gaudi, stream requests are displayed under opea: llm_generate_stream.  
+This trace contains two spans: one for the first token and another for all subsequent tokens. 
+
+![chatqna_1req](../assets/jaeger_ui_opea_chatqna_1req.png)
+
+The first trace along the timeline is opea: ServiceOrchestrator.schedule, which operates on the CPU. 
+This trace provides insights into the orchestration and scheduling of services within the ChatQnA megaservice, highlighting the execution flow during the process.
+
+![chatqna_cpu_req](../assets/jaeger_ui_opea_chatqna_req_cpu.png)
+
+Clicking on the opea: ServiceOrchestrator.schedule trace will expand to reveal seven spans along the timeline.  
+The first span represents the main schedule function, which has minimal self-execution time, indicated in black.   
+The second span corresponds to the embedding microservice execution time, taking 33.72 ms as shown in the diagram.  
+Following the embedding is the retriever span, which took only 3.13 ms.  
+The last span captures the LLM functions on the CPU, with an execution time of 41.99 ms.  
+These spans provide a detailed breakdown of the execution flow and timing for each component within the service orchestration.  
+
+![chatqna_cpu_breakdown](../assets/jaeger_ui_opea_chatqna_cpu_breakdown.png)
+
+The second trace following the schedule trace is opea: llm_generate_stream, which operates on Gaudi, as depicted in the diagram.  
+This trace provides insights into the execution of LLM functions on Gaudi,  
+highlighting the processing of stream requests and the associated spans for token generation. 
+
+![chatqna_gaudi_req](../assets/jaeger_ui_opea_chatqna_req_gaudi.png)
+
+Clicking on the opea: llm_generate_stream trace will expand to reveal two spans along the timeline.  
+The first span represents the execution time for the first token, which took 15.12 ms in this run.  
+The second span captures the execution time for all subsequent tokens, taking 920 ms as shown in the diagram. 
+These spans provide a detailed view of the token generation process and the performance of LLM functions on Gaudi.
+
+![chatqna_gaudi_breakdown](../assets/jaeger_ui_opea_chatqna_req_breakdown_2.png)
+
+Overall, the traces on the CPU consist of seven spans and are represented as larger circles.   
+In contrast, the traces on Gaudi have two spans and are depicted as smaller circles.   
+The diagrams below illustrate a run with 16 user requests, resulting in a total of 32 traces.  
+In this scenario, the larger circles, representing CPU traces, took less time than the smaller circles,  
+indicating that the requests required more processing time on Gaudi compared to the CPU. 
+
+![chatqna_gaudi_breakdown](../assets/chatqna_16reqs.png).
+
+## Telemetry Metrics with Grafana on Gaudi
+
+The ChatQnA application offers several useful dashboards that provide valuable insights into its performance and operations.  
+These dashboards are designed to help monitor various aspects of the application, such as service execution times, resource utilization, and system health,  
+enabling users to effectively manage and optimize the application.  
+
+### ChatQnA MegaService Dashboard
+
+This dashboard provides metrics for services within the ChatQnA megaservice.  
+The chatqna-backend-server service, which functions as the megaservice,  
+is highlighted with its average response time displayed across multiple runs.   
+Additionally, the dashboard presents CPU and memory usage statistics for the megaservice,   
+offering a comprehensive view of its performance and resource consumption.  
+
+![chatqna_1req](../assets/Grafana_chatqna_backend_server_1.png)
+
+The dashboard can also display metrics for the dataprep-redis-service and the retriever service.  
+These metrics provide insights into the performance and resource utilization of these services,  
+allowing for a more comprehensive understanding of the ChatQnA application's overall operation. 
+
+![chatqna_1req](../assets/Grafana_chatqna_dataprep.png)
+
+![chatqna_1req](../assets/Grafana_chatqna_retriever.png)
+
+### LLM Dashboard
+
+This dashboard presents metrics for the LLM service, including key performance indicators such as request latency, time per output token latency,   
+and time to first token latency, among others.  
+These metrics offer valuable insights into the efficiency and responsiveness of the LLM service,   
+helping to identify areas for optimization and ensuring smooth operation.  
+
+![chatqna_1req](../assets/Grafana_vLLM.png)
+
+The dashboard also displays metrics for request prompt length and output length. 
+
+![chatqna_1req](../assets/Grafana_vLLM_2.png)
@@ -16,6 +16,7 @@ Provide following tutorials to cover common user cases:
    DocSum/DocSum_Guide
    DocIndexRetriever/DocIndexRetriever_Guide
    VideoQnA/VideoQnA_Guide
+   OpenTelemetry/OpenTelemetry_OPEA_Guide
 
 -----