|
| 1 | +--- |
| 2 | +title: Manage OpenTelemetry Collectors at scale with Ansible |
| 3 | +linkTitle: Collectors at scale with Ansible |
| 4 | +date: 2024-04-15 |
| 5 | +author: '[Ishan Jain](https://github.com/ishanjainn) (Grafana)' |
| 6 | +cSpell:ignore: ansible associated Ishan ishanjainn Jain |
| 7 | +--- |
| 8 | + |
| 9 | +You can scale the deployment of |
| 10 | +[OpenTelemetry Collector](/docs/collector/deployment/) across multiple Linux |
| 11 | +hosts through [Ansible](https://www.ansible.com/), to function both as |
| 12 | +[gateways](/docs/collector/deployment/gateway/) and |
| 13 | +[agents](/docs/collector/deployment/agent/) within your observability |
| 14 | +architecture. Using the OpenTelemetry Collector in this dual capacity enables a |
| 15 | +robust collection and forwarding of metrics, traces, and logs to analysis and |
| 16 | +visualization platforms. |
| 17 | + |
| 18 | +We outline a strategy for deploying and managing the OpenTelemetry Collector's |
| 19 | +scalable instances throughout your infrastructure using Ansible. In the |
| 20 | +following example, we'll use [Grafana](https://grafana.com/) as the target |
| 21 | +backend for metrics. |
| 22 | + |
| 23 | +## Prerequisites |
| 24 | + |
| 25 | +Before we begin, make sure you meet the following requirements: |
| 26 | + |
| 27 | +- Ansible installed on your base system |
| 28 | +- SSH access to two or more Linux hosts |
| 29 | +- Prometheus configured to gather your metrics |
| 30 | + |
| 31 | +## Install the Grafana Ansible collection |
| 32 | + |
| 33 | +The |
| 34 | +[OpenTelemetry Collector role](https://github.com/grafana/grafana-ansible-collection/tree/main/roles/opentelemetry_collector) |
| 35 | +is provided through the |
| 36 | +[Grafana Ansible collection](https://docs.ansible.com/ansible/latest/collections/grafana/grafana/) |
| 37 | +as of release 4.0. |
| 38 | + |
| 39 | +To install the Grafana Ansible collection, run this command: |
| 40 | + |
| 41 | +```sh |
| 42 | +ansible-galaxy collection install grafana.grafana |
| 43 | +``` |
| 44 | + |
| 45 | +## Create an Ansible inventory file |
| 46 | + |
| 47 | +Next, gather the IP addresses and URLs associated with your Linux hosts and |
| 48 | +create an inventory file. |
| 49 | + |
| 50 | +1. Create an Ansible inventory file. |
| 51 | + |
| 52 | + An Ansible inventory, which resides in a file named `inventory`, lists each |
| 53 | + host IP on a separate line, like this (8 hosts shown): |
| 54 | + |
| 55 | + ```properties |
| 56 | + 10.0.0.1 # hostname = ubuntu-01 |
| 57 | + 10.0.0.2 # hostname = ubuntu-02 |
| 58 | + 10.0.0.3 # hostname = centos-01 |
| 59 | + 10.0.0.4 # hostname = centos-02 |
| 60 | + 10.0.0.5 # hostname = debian-01 |
| 61 | + 10.0.0.6 # hostname = debian-02 |
| 62 | + 10.0.0.7 # hostname = fedora-01 |
| 63 | + 10.0.0.8 # hostname = fedora-02 |
| 64 | + ``` |
| 65 | + |
| 66 | +2. Create an `ansible.cfg` file within the same directory as `inventory`, with |
| 67 | + the following values: |
| 68 | + |
| 69 | + ```toml |
| 70 | + [defaults] |
| 71 | + inventory = inventory # Path to the inventory file |
| 72 | + private_key_file = ~/.ssh/id_rsa # Path to private SSH Key |
| 73 | + remote_user=root |
| 74 | + ``` |
| 75 | + |
| 76 | +## Use the OpenTelemetry Collector Ansible role |
| 77 | + |
| 78 | +Next, define an Ansible playbook to apply your chosen or created OpenTelemetry |
| 79 | +Collector role across your hosts. |
| 80 | + |
| 81 | +Create a file named `deploy-opentelemetry.yml` in the same directory as your |
| 82 | +`ansible.cfg` and `inventory` files: |
| 83 | + |
| 84 | +```yaml |
| 85 | +- name: Install OpenTelemetry Collector |
| 86 | + hosts: all |
| 87 | + become: true |
| 88 | + |
| 89 | + tasks: |
| 90 | + - name: Install OpenTelemetry Collector |
| 91 | + ansible.builtin.include_role: |
| 92 | + name: opentelemetry_collectorr |
| 93 | + vars: |
| 94 | + otel_collector_receivers: |
| 95 | + hostmetrics: |
| 96 | + collection_interval: 60s |
| 97 | + scrapers: |
| 98 | + cpu: {} |
| 99 | + disk: {} |
| 100 | + load: {} |
| 101 | + filesystem: {} |
| 102 | + memory: {} |
| 103 | + network: {} |
| 104 | + paging: {} |
| 105 | + process: |
| 106 | + mute_process_name_error: true |
| 107 | + mute_process_exe_error: true |
| 108 | + mute_process_io_error: true |
| 109 | + processes: {} |
| 110 | + |
| 111 | + otel_collector_processors: |
| 112 | + batch: |
| 113 | + resourcedetection: |
| 114 | + detectors: [env, system] |
| 115 | + timeout: 2s |
| 116 | + system: |
| 117 | + hostname_sources: [os] |
| 118 | + transform/add_resource_attributes_as_metric_attributes: |
| 119 | + error_mode: ignore |
| 120 | + metric_statements: |
| 121 | + - context: datapoint |
| 122 | + statements: |
| 123 | + - set(attributes["deployment.environment"], |
| 124 | + resource.attributes["deployment.environment"]) |
| 125 | + - set(attributes["service.version"], |
| 126 | + resource.attributes["service.version"]) |
| 127 | + |
| 128 | + otel_collector_exporters: |
| 129 | + prometheusremotewrite: |
| 130 | + endpoint: https://<prometheus-url>/api/prom/push |
| 131 | + headers: |
| 132 | + Authorization: 'Basic <base64-encoded-username:password>' |
| 133 | + |
| 134 | + otel_collector_service: |
| 135 | + pipelines: |
| 136 | + metrics: |
| 137 | + receivers: [hostmetrics] |
| 138 | + processors: |
| 139 | + [ |
| 140 | + resourcedetection, |
| 141 | + transform/add_resource_attributes_as_metric_attributes, |
| 142 | + batch, |
| 143 | + ] |
| 144 | + exporters: [prometheusremotewrite] |
| 145 | +``` |
| 146 | +
|
| 147 | +{{% alert title="Note" %}} |
| 148 | +
|
| 149 | +Adjust the configuration to match the specific telemetry you intend to collect |
| 150 | +as well as where you plan to forward it to. This configuration snippet is a |
| 151 | +basic example designed for collecting host metrics that get forwarded to |
| 152 | +Prometheus. |
| 153 | +
|
| 154 | +{{% /alert %}} |
| 155 | +
|
| 156 | +The previous configuration would provision the OpenTelemetry Collector to |
| 157 | +collect metrics from the Linux host. |
| 158 | +
|
| 159 | +## Running the Ansible playbook |
| 160 | +
|
| 161 | +Deploy the OpenTelemetry Collector across your hosts by running the following |
| 162 | +command: |
| 163 | +
|
| 164 | +```sh |
| 165 | +ansible-playbook deploy-opentelemetry.yml |
| 166 | +``` |
| 167 | + |
| 168 | +## Check your metrics in the backend |
| 169 | + |
| 170 | +After your OpenTelemetry Collectors start sending metrics to Prometheus, follow |
| 171 | +these steps to visualize them in Grafana: |
| 172 | + |
| 173 | +### Set up Grafana |
| 174 | + |
| 175 | +1. **Install Docker**: Make sure Docker is installed on your system. |
| 176 | + |
| 177 | +2. **Run Grafana Docker Container**: Start a Grafana server with the following |
| 178 | + command, which fetches the latest Grafana image: |
| 179 | + |
| 180 | + ```sh |
| 181 | + docker run -d -p 3000:3000 --name=grafana grafana/grafana |
| 182 | + ``` |
| 183 | + |
| 184 | +3. **Access Grafana**: Open <http://localhost:3000> in your web browser. The |
| 185 | + default login username and password are both `admin`. |
| 186 | + |
| 187 | +4. **Change passwords** when prompted on first login -- pick a secure one! |
| 188 | + |
| 189 | +For other installation methods and more detailed instructions, refer to the |
| 190 | +[official Grafana documentation](https://grafana.com/docs/grafana/latest/#installing-grafana). |
| 191 | + |
| 192 | +### Add Prometheus as a data source |
| 193 | + |
| 194 | +1. In Grafana, navigate to **Connections** > **Data Sources**. |
| 195 | +2. Click **Add data source** and select **Prometheus**. |
| 196 | +3. In the settings, enter your Prometheus URL, for example, |
| 197 | + `http://<your_prometheus_host>`, along with any other necessary details. |
| 198 | +4. Select **Save & Test**. |
| 199 | + |
| 200 | +### Explore your metrics |
| 201 | + |
| 202 | +1. Go to the **Explore** page |
| 203 | +2. In the Query editor, select your data source and enter the following query |
| 204 | + |
| 205 | + ```PromQL |
| 206 | + 100 - (avg by (cpu) (irate(system_cpu_time{state="idle"}[5m])) * 100) |
| 207 | + ``` |
| 208 | + |
| 209 | + This query calculates the average percentage of CPU time not spent in the |
| 210 | + "idle" state, across each CPU core, over the last 5 minutes. |
| 211 | + |
| 212 | +3. Explore other metrics and create dashboards to gain insights into your |
| 213 | + system's performance. |
| 214 | + |
| 215 | +This blog post illustrated how you can configure and deploy multiple |
| 216 | +OpenTelemetry Collectors across various Linux hosts with the help of Ansible, as |
| 217 | +well as visualize collected telemetry in Grafana. Incase you find this useful, |
| 218 | +GitHub repository for |
| 219 | +[OpenTelemetry Collector role](https://github.com/grafana/grafana-ansible-collection/tree/main/roles/opentelemetry_collector) |
| 220 | +for detailed configuration options. If you have questions, You can connect with |
| 221 | +me using my contact details at my GitHub profile |
| 222 | +[@ishanjainn](https://github.com/ishanjainn). |
0 commit comments