Blog: Scaling OpenTelemetry Collectors using Ansible (#4182)

ishanjainn · web-flow · commit 97f51f48f7f3 · 2024-04-15T11:01:01.000+02:00
diff --git a/content/en/blog/2024/scaling-collectors.md b/content/en/blog/2024/scaling-collectors.md
@@ -0,0 +1,222 @@
+---
+title: Manage OpenTelemetry Collectors at scale with Ansible
+linkTitle: Collectors at scale with Ansible
+date: 2024-04-15
+author: '[Ishan Jain](https://github.com/ishanjainn) (Grafana)'
+cSpell:ignore: ansible associated Ishan ishanjainn Jain
+---
+
+You can scale the deployment of
+[OpenTelemetry Collector](/docs/collector/deployment/) across multiple Linux
+hosts through [Ansible](https://www.ansible.com/), to function both as
+[gateways](/docs/collector/deployment/gateway/) and
+[agents](/docs/collector/deployment/agent/) within your observability
+architecture. Using the OpenTelemetry Collector in this dual capacity enables a
+robust collection and forwarding of metrics, traces, and logs to analysis and
+visualization platforms.
+
+We outline a strategy for deploying and managing the OpenTelemetry Collector's
+scalable instances throughout your infrastructure using Ansible. In the
+following example, we'll use [Grafana](https://grafana.com/) as the target
+backend for metrics.
+
+## Prerequisites
+
+Before we begin, make sure you meet the following requirements:
+
+- Ansible installed on your base system
+- SSH access to two or more Linux hosts
+- Prometheus configured to gather your metrics
+
+## Install the Grafana Ansible collection
+
+The
+[OpenTelemetry Collector role](https://github.com/grafana/grafana-ansible-collection/tree/main/roles/opentelemetry_collector)
+is provided through the
+[Grafana Ansible collection](https://docs.ansible.com/ansible/latest/collections/grafana/grafana/)
+as of release 4.0.
+
+To install the Grafana Ansible collection, run this command:
+
+```sh
+ansible-galaxy collection install grafana.grafana
+```
+
+## Create an Ansible inventory file
+
+Next, gather the IP addresses and URLs associated with your Linux hosts and
+create an inventory file.
+
+1. Create an Ansible inventory file.
+
+   An Ansible inventory, which resides in a file named `inventory`, lists each
+   host IP on a separate line, like this (8 hosts shown):
+
+   ```properties
+   10.0.0.1    # hostname = ubuntu-01
+   10.0.0.2    # hostname = ubuntu-02
+   10.0.0.3    # hostname = centos-01
+   10.0.0.4    # hostname = centos-02
+   10.0.0.5    # hostname = debian-01
+   10.0.0.6    # hostname = debian-02
+   10.0.0.7    # hostname = fedora-01
+   10.0.0.8    # hostname = fedora-02
+   ```
+
+2. Create an `ansible.cfg` file within the same directory as `inventory`, with
+   the following values:
+
+   ```toml
+   [defaults]
+   inventory = inventory  # Path to the inventory file
+   private_key_file = ~/.ssh/id_rsa   # Path to private SSH Key
+   remote_user=root
+   ```
+
+## Use the OpenTelemetry Collector Ansible role
+
+Next, define an Ansible playbook to apply your chosen or created OpenTelemetry
+Collector role across your hosts.
+
+Create a file named `deploy-opentelemetry.yml` in the same directory as your
+`ansible.cfg` and `inventory` files:
+
+```yaml
+- name: Install OpenTelemetry Collector
+  hosts: all
+  become: true
+
+  tasks:
+    - name: Install OpenTelemetry Collector
+      ansible.builtin.include_role:
+        name: opentelemetry_collectorr
+      vars:
+        otel_collector_receivers:
+          hostmetrics:
+            collection_interval: 60s
+            scrapers:
+              cpu: {}
+              disk: {}
+              load: {}
+              filesystem: {}
+              memory: {}
+              network: {}
+              paging: {}
+              process:
+                mute_process_name_error: true
+                mute_process_exe_error: true
+                mute_process_io_error: true
+              processes: {}
+
+        otel_collector_processors:
+          batch:
+          resourcedetection:
+            detectors: [env, system]
+            timeout: 2s
+            system:
+              hostname_sources: [os]
+          transform/add_resource_attributes_as_metric_attributes:
+            error_mode: ignore
+            metric_statements:
+              - context: datapoint
+                statements:
+                  - set(attributes["deployment.environment"],
+                    resource.attributes["deployment.environment"])
+                  - set(attributes["service.version"],
+                    resource.attributes["service.version"])
+
+        otel_collector_exporters:
+          prometheusremotewrite:
+            endpoint: https://<prometheus-url>/api/prom/push
+            headers:
+              Authorization: 'Basic <base64-encoded-username:password>'
+
+        otel_collector_service:
+          pipelines:
+            metrics:
+              receivers: [hostmetrics]
+              processors:
+                [
+                  resourcedetection,
+                  transform/add_resource_attributes_as_metric_attributes,
+                  batch,
+                ]
+              exporters: [prometheusremotewrite]
+```
+
+{{% alert title="Note" %}}
+
+Adjust the configuration to match the specific telemetry you intend to collect
+as well as where you plan to forward it to. This configuration snippet is a
+basic example designed for collecting host metrics that get forwarded to
+Prometheus.
+
+{{% /alert %}}
+
+The previous configuration would provision the OpenTelemetry Collector to
+collect metrics from the Linux host.
+
+## Running the Ansible playbook
+
+Deploy the OpenTelemetry Collector across your hosts by running the following
+command:
+
+```sh
+ansible-playbook deploy-opentelemetry.yml
+```
+
+## Check your metrics in the backend
+
+After your OpenTelemetry Collectors start sending metrics to Prometheus, follow
+these steps to visualize them in Grafana:
+
+### Set up Grafana
+
+1. **Install Docker**: Make sure Docker is installed on your system.
+
+2. **Run Grafana Docker Container**: Start a Grafana server with the following
+   command, which fetches the latest Grafana image:
+
+   ```sh
+   docker run -d -p 3000:3000 --name=grafana grafana/grafana
+   ```
+
+3. **Access Grafana**: Open <http://localhost:3000> in your web browser. The
+   default login username and password are both `admin`.
+
+4. **Change passwords** when prompted on first login -- pick a secure one!
+
+For other installation methods and more detailed instructions, refer to the
+[official Grafana documentation](https://grafana.com/docs/grafana/latest/#installing-grafana).
+
+### Add Prometheus as a data source
+
+1. In Grafana, navigate to **Connections** > **Data Sources**.
+2. Click **Add data source** and select **Prometheus**.
+3. In the settings, enter your Prometheus URL, for example,
+   `http://<your_prometheus_host>`, along with any other necessary details.
+4. Select **Save & Test**.
+
+### Explore your metrics
+
+1. Go to the **Explore** page
+2. In the Query editor, select your data source and enter the following query
+
+   ```PromQL
+   100 - (avg by (cpu) (irate(system_cpu_time{state="idle"}[5m])) * 100)
+   ```
+
+   This query calculates the average percentage of CPU time not spent in the
+   "idle" state, across each CPU core, over the last 5 minutes.
+
+3. Explore other metrics and create dashboards to gain insights into your
+   system's performance.
+
+This blog post illustrated how you can configure and deploy multiple
+OpenTelemetry Collectors across various Linux hosts with the help of Ansible, as
+well as visualize collected telemetry in Grafana. Incase you find this useful,
+GitHub repository for
+[OpenTelemetry Collector role](https://github.com/grafana/grafana-ansible-collection/tree/main/roles/opentelemetry_collector)
+for detailed configuration options. If you have questions, You can connect with
+me using my contact details at my GitHub profile
+[@ishanjainn](https://github.com/ishanjainn).
diff --git a/static/refcache.json b/static/refcache.json
@@ -811,6 +811,10 @@
     "StatusCode": 206,
     "LastSeen": "2024-01-30T16:07:39.690877-05:00"
   },
+  "https://docs.ansible.com/ansible/latest/collections/grafana/grafana/": {
+    "StatusCode": 206,
+    "LastSeen": "2024-03-19T11:21:52.991213698Z"
+  },
   "https://docs.appdynamics.com/latest/en/application-monitoring/appdynamics-for-opentelemetry": {
     "StatusCode": 200,
     "LastSeen": "2024-01-18T08:51:22.195056-05:00"
@@ -2595,6 +2599,10 @@
     "StatusCode": 200,
     "LastSeen": "2024-01-30T16:14:36.112572-05:00"
   },
+  "https://github.com/ishanjainn": {
+    "StatusCode": 200,
+    "LastSeen": "2024-03-19T11:21:47.871135724Z"
+  },
   "https://github.com/jack-berg": {
     "StatusCode": 200,
     "LastSeen": "2024-01-18T20:04:54.949867-05:00"
@@ -4489,15 +4497,19 @@
   },
   "https://grafana.com/docs/alloy/latest/": {
     "StatusCode": 200,
-    "LastSeen": "2024-04-10T00:09:47.949842+02:00"
+    "LastSeen": "2024-04-12T20:40:28.798266582Z"
   },
   "https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/setup/instrument/dotnet/": {
     "StatusCode": 200,
-    "LastSeen": "2024-04-10T00:09:50.125651+02:00"
+    "LastSeen": "2024-04-12T20:40:30.368448693Z"
   },
   "https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/setup/instrument/java/": {
     "StatusCode": 200,
-    "LastSeen": "2024-04-10T00:09:55.400731+02:00"
+    "LastSeen": "2024-04-12T20:40:34.652514906Z"
+  },
+  "https://grafana.com/docs/grafana/latest/#installing-grafana": {
+    "StatusCode": 200,
+    "LastSeen": "2024-04-12T20:40:33.435682362Z"
   },
   "https://grafana.com/oss/opentelemetry/": {
     "StatusCode": 200,
@@ -7811,6 +7823,10 @@
     "StatusCode": 200,
     "LastSeen": "2024-01-19T09:04:05.862693+01:00"
   },
+  "https://www.ansible.com/": {
+    "StatusCode": 200,
+    "LastSeen": "2024-03-19T11:21:48.883430689Z"
+  },
   "https://www.apollographql.com/docs/federation/": {
     "StatusCode": 206,
     "LastSeen": "2024-01-18T19:55:56.349642-05:00"