|
| 1 | +.. _OpenTelemetry_OPEA_Guide: |
| 2 | + |
| 3 | +OpenTelemetry on OPEA Guide |
| 4 | +############################# |
| 5 | + |
| 6 | +Overview |
| 7 | +******** |
| 8 | +OpenTelemetry (also referred to as OTel) is an open source observability framework made up of a collection of tools, APIs, and SDKs. |
| 9 | +OTel enables developers to instrument, generate, collect, and export telemetry data for analysis and to understand software performance and behavior. |
| 10 | +The telemetry data can come in the form of traces, metrics, and logs. |
| 11 | +OPEA integrates OpenTelemetry's metrics and tracing capabilities to enhance its telemetry support, providing users with valuable insights into system performance. |
| 12 | + |
| 13 | + |
| 14 | +How It Works |
| 15 | +************ |
| 16 | +OPEA Comps offers telemetry functionalities for metrics and tracing by integrating with tools such as Prometheus, Grafana, and Jaeger. Below is a brief introduction to the workflows of those tools: |
| 17 | + |
| 18 | +.. image:: assets/opea_telemetry.jpg |
| 19 | + :width: 800 |
| 20 | + :alt: Alternative text |
| 21 | + |
| 22 | + |
| 23 | + |
| 24 | +The majority of OPEA's micro and mega services are equipped to support OpenTelemetry metrics, which are exported in Prometheus format via the /metrics endpoint. |
| 25 | +For further guidance, please refer to the section on `Telemetry Metrics <https://github.com/opea-project/GenAIComps/tree/main/comps/cores/telemetry#metrics>`_. |
| 26 | +Prometheus plays a crucial role in collecting metrics from OPEA service endpoints, while Grafana leverages Prometheus as a data source to visualize these metrics on pre-configured dashboards. |
| 27 | + |
| 28 | +OPEA also supports OpenTelemetry tracing, with several OPEA GenAIExamples instrumented to trace key functions such as microservice execution and LLM generations. |
| 29 | +Additionally, HuggingFace's Text Embedding Inference and Text Generation Inference services are enabled for select OPEA GenAIExamples. |
| 30 | +The Jaeger UI monitors trace events from OPEA microservices, TEI, and TGI. Once Jaeger endpoints are configured in OPEA microservices, TEI, and TGI, |
| 31 | +trace data will automatically be reported and visualized in the Jaeger UI. |
| 32 | + |
| 33 | + |
| 34 | +Deployment |
| 35 | +********** |
| 36 | + |
| 37 | +In the OpenTelemetry-enabled GenAIExamples, OpenTelemetry Metrics is activated by default, while OpenTelemetry Tracing is initially disabled. |
| 38 | +Similarly, the Telemetry UI services, including Grafana, Prometheus, and Jaeger, are also disabled by default. |
| 39 | +To enable OTel tracing along with Grafana, Prometheus, and Jaeger for an example, you can include an additional telemetry Docker Compose YAML file. |
| 40 | +For instance, adding compose.telemetry.yaml alongside compose.yaml will activate all telemetry features for the example. |
| 41 | + |
| 42 | + |
| 43 | +.. code-block:: bash |
| 44 | +
|
| 45 | + source ./set_env.sh |
| 46 | + docker compose -f compose.yaml -f compose.telemetry.yaml up -d |
| 47 | +
|
| 48 | +
|
| 49 | +Below are the GenAIExamples that include support for Grafana, Prometheus, and Jaeger services. |
| 50 | + |
| 51 | +.. toctree:: |
| 52 | + :maxdepth: 1 |
| 53 | + |
| 54 | + ChatQnA <deploy/ChatQnA> |
| 55 | + |
| 56 | +How to Monitor |
| 57 | +**************** |
| 58 | + |
| 59 | +OpenTelemetry metrics and tracing can be visualized through three primary monitoring UI web pages. |
| 60 | + |
| 61 | +1. Prometheus |
| 62 | ++++++++++++++++ |
| 63 | + |
| 64 | +The Prometheus UI provides insights into which services have active metrics endpoints. |
| 65 | +By default, Prometheus operates on port 9090. |
| 66 | +You can access the Prometheus UI web page using the following URL. |
| 67 | + |
| 68 | +.. code-block:: bash |
| 69 | +
|
| 70 | + http://${host_ip}:9090/targets |
| 71 | +
|
| 72 | +Services with accessible metrics endpoints will be marked as "up" in Prometheus. |
| 73 | +If a service is marked as "down," Grafana Dashboards will be unable to display the associated metrics information. |
| 74 | + |
| 75 | +.. image:: assets/prometheus.png |
| 76 | + :width: 800 |
| 77 | + :alt: Alternative text |
| 78 | + |
| 79 | +2. Grafana |
| 80 | ++++++++++++++++ |
| 81 | + |
| 82 | +The Grafana UI displays telemetry metrics through pre-defined dashboards, providing a clear visualization of data. |
| 83 | +For OPEA examples, Grafana is configured by default to use Prometheus as its data source, eliminating the need for manual setup. |
| 84 | +The Grafana UI web page could be accessed using the following URL. |
| 85 | + |
| 86 | +.. code-block:: bash |
| 87 | +
|
| 88 | + http://${host_ip}:3000 |
| 89 | +
|
| 90 | +
|
| 91 | +.. image:: assets/grafana_init.png |
| 92 | + :width: 800 |
| 93 | + :alt: Alternative text |
| 94 | + |
| 95 | + |
| 96 | +To view the pre-defined dashboards, click on the "Dashboard" tab located on the left-hand side of the Grafana UI. |
| 97 | +This will allow you to explore various dashboards that have been set up to visualize telemetry metrics effectively. |
| 98 | + |
| 99 | + |
| 100 | +.. image:: assets/grafana_dashboard_init.png |
| 101 | + :width: 800 |
| 102 | + :alt: Alternative text |
| 103 | + |
| 104 | +Detailed explanations for understanding each dashboard are provided within the telemetry sections of the respective GenAIExamples. |
| 105 | +These sections offer insights into how to interpret the data and utilize the dashboards effectively for monitoring and analysis. |
| 106 | + |
| 107 | +.. toctree:: |
| 108 | + :maxdepth: 1 |
| 109 | + |
| 110 | + ChatQnA <deploy/ChatQnA> |
| 111 | + |
| 112 | + |
| 113 | +3. Jaeger |
| 114 | ++++++++++++++++ |
| 115 | + |
| 116 | +The Jaeger UI is instrumental in understanding function tracing for each request, providing visibility into the execution flow and timing of microservices. |
| 117 | +OPEA traces the execution time for each microservice and monitors key functions within them. |
| 118 | +By default, Jaeger operates on port 16686. |
| 119 | +The Jaeger UI web page could be accessed using the following URL. |
| 120 | + |
| 121 | +.. code-block:: bash |
| 122 | +
|
| 123 | + http://${host_ip}:16686 |
| 124 | +
|
| 125 | +Traces will only appear in the Jaeger UI if the relevant functions have been executed. |
| 126 | +Therefore, without running the example, the UI will not display any trace data. |
| 127 | + |
| 128 | +.. image:: assets/jaeger_ui_init.png |
| 129 | + :width: 400 |
| 130 | + :alt: Alternative text |
| 131 | + |
| 132 | +Once the example is run, refresh the Jaeger UI webpage, and the OPEA service should appear under the "Services" tab, |
| 133 | +indicating that trace data is being captured and displayed. |
| 134 | + |
| 135 | +.. image:: assets/jaeger_ui_opea.png |
| 136 | + :width: 400 |
| 137 | + :alt: Alternative text |
| 138 | + |
| 139 | +Select "opea" as the service, then click the "Find Traces" button to view the trace data associated with the service's execution. |
| 140 | + |
| 141 | +.. image:: assets/jaeger_ui_opea_trace.png |
| 142 | + :width: 400 |
| 143 | + :alt: Alternative text |
| 144 | + |
| 145 | + |
| 146 | +All traces will be displayed on the UI. |
| 147 | +The diagram in the upper right corner provides a visual representation of all requests along the timeline. Meanwhile, |
| 148 | +the diagrams in the lower right corner illustrate all spans within each request, offering detailed insights into the execution flow and timing. |
| 149 | + |
| 150 | +.. image:: assets/jaeger_ui_opea_chatqna_1req.png |
| 151 | + :width: 800 |
| 152 | + :alt: Alternative text |
| 153 | + |
| 154 | +Detailed explanations for understanding each Jaeger diagrams are provided within the telemetry sections of the respective GenAIExamples. |
| 155 | +These sections offer insights into how to interpret the data and utilize the dashboards effectively for monitoring and analysis. |
| 156 | + |
| 157 | +.. toctree:: |
| 158 | + :maxdepth: 1 |
| 159 | + |
| 160 | + ChatQnA <deploy/ChatQnA> |
| 161 | + |
| 162 | + |
| 163 | +Code Instrumentations for OPEA Tracing |
| 164 | +**************************************** |
| 165 | + |
| 166 | +Enabling OPEA OpenTelemetry tracing for a function is straightforward. |
| 167 | +First, import opea_telemetry, and then apply the Python decorator @opea_telemetry to the function you wish to trace. |
| 168 | +Below is an example of how to trace your_func using OPEA tracing: |
| 169 | + |
| 170 | + |
| 171 | +.. code-block:: python |
| 172 | +
|
| 173 | + from comps import opea_telemetry |
| 174 | +
|
| 175 | + @opea_telemetry |
| 176 | + async def your_func(): |
| 177 | + pass |
| 178 | +
|
| 179 | +
|
| 180 | +
|
0 commit comments