Skip to content

Commit 97f51f4

Browse files
authored
Blog: Scaling OpenTelemetry Collectors using Ansible (#4182)
1 parent 4328478 commit 97f51f4

File tree

2 files changed

+241
-3
lines changed

2 files changed

+241
-3
lines changed
+222
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
---
2+
title: Manage OpenTelemetry Collectors at scale with Ansible
3+
linkTitle: Collectors at scale with Ansible
4+
date: 2024-04-15
5+
author: '[Ishan Jain](https://github.com/ishanjainn) (Grafana)'
6+
cSpell:ignore: ansible associated Ishan ishanjainn Jain
7+
---
8+
9+
You can scale the deployment of
10+
[OpenTelemetry Collector](/docs/collector/deployment/) across multiple Linux
11+
hosts through [Ansible](https://www.ansible.com/), to function both as
12+
[gateways](/docs/collector/deployment/gateway/) and
13+
[agents](/docs/collector/deployment/agent/) within your observability
14+
architecture. Using the OpenTelemetry Collector in this dual capacity enables a
15+
robust collection and forwarding of metrics, traces, and logs to analysis and
16+
visualization platforms.
17+
18+
We outline a strategy for deploying and managing the OpenTelemetry Collector's
19+
scalable instances throughout your infrastructure using Ansible. In the
20+
following example, we'll use [Grafana](https://grafana.com/) as the target
21+
backend for metrics.
22+
23+
## Prerequisites
24+
25+
Before we begin, make sure you meet the following requirements:
26+
27+
- Ansible installed on your base system
28+
- SSH access to two or more Linux hosts
29+
- Prometheus configured to gather your metrics
30+
31+
## Install the Grafana Ansible collection
32+
33+
The
34+
[OpenTelemetry Collector role](https://github.com/grafana/grafana-ansible-collection/tree/main/roles/opentelemetry_collector)
35+
is provided through the
36+
[Grafana Ansible collection](https://docs.ansible.com/ansible/latest/collections/grafana/grafana/)
37+
as of release 4.0.
38+
39+
To install the Grafana Ansible collection, run this command:
40+
41+
```sh
42+
ansible-galaxy collection install grafana.grafana
43+
```
44+
45+
## Create an Ansible inventory file
46+
47+
Next, gather the IP addresses and URLs associated with your Linux hosts and
48+
create an inventory file.
49+
50+
1. Create an Ansible inventory file.
51+
52+
An Ansible inventory, which resides in a file named `inventory`, lists each
53+
host IP on a separate line, like this (8 hosts shown):
54+
55+
```properties
56+
10.0.0.1 # hostname = ubuntu-01
57+
10.0.0.2 # hostname = ubuntu-02
58+
10.0.0.3 # hostname = centos-01
59+
10.0.0.4 # hostname = centos-02
60+
10.0.0.5 # hostname = debian-01
61+
10.0.0.6 # hostname = debian-02
62+
10.0.0.7 # hostname = fedora-01
63+
10.0.0.8 # hostname = fedora-02
64+
```
65+
66+
2. Create an `ansible.cfg` file within the same directory as `inventory`, with
67+
the following values:
68+
69+
```toml
70+
[defaults]
71+
inventory = inventory # Path to the inventory file
72+
private_key_file = ~/.ssh/id_rsa # Path to private SSH Key
73+
remote_user=root
74+
```
75+
76+
## Use the OpenTelemetry Collector Ansible role
77+
78+
Next, define an Ansible playbook to apply your chosen or created OpenTelemetry
79+
Collector role across your hosts.
80+
81+
Create a file named `deploy-opentelemetry.yml` in the same directory as your
82+
`ansible.cfg` and `inventory` files:
83+
84+
```yaml
85+
- name: Install OpenTelemetry Collector
86+
hosts: all
87+
become: true
88+
89+
tasks:
90+
- name: Install OpenTelemetry Collector
91+
ansible.builtin.include_role:
92+
name: opentelemetry_collectorr
93+
vars:
94+
otel_collector_receivers:
95+
hostmetrics:
96+
collection_interval: 60s
97+
scrapers:
98+
cpu: {}
99+
disk: {}
100+
load: {}
101+
filesystem: {}
102+
memory: {}
103+
network: {}
104+
paging: {}
105+
process:
106+
mute_process_name_error: true
107+
mute_process_exe_error: true
108+
mute_process_io_error: true
109+
processes: {}
110+
111+
otel_collector_processors:
112+
batch:
113+
resourcedetection:
114+
detectors: [env, system]
115+
timeout: 2s
116+
system:
117+
hostname_sources: [os]
118+
transform/add_resource_attributes_as_metric_attributes:
119+
error_mode: ignore
120+
metric_statements:
121+
- context: datapoint
122+
statements:
123+
- set(attributes["deployment.environment"],
124+
resource.attributes["deployment.environment"])
125+
- set(attributes["service.version"],
126+
resource.attributes["service.version"])
127+
128+
otel_collector_exporters:
129+
prometheusremotewrite:
130+
endpoint: https://<prometheus-url>/api/prom/push
131+
headers:
132+
Authorization: 'Basic <base64-encoded-username:password>'
133+
134+
otel_collector_service:
135+
pipelines:
136+
metrics:
137+
receivers: [hostmetrics]
138+
processors:
139+
[
140+
resourcedetection,
141+
transform/add_resource_attributes_as_metric_attributes,
142+
batch,
143+
]
144+
exporters: [prometheusremotewrite]
145+
```
146+
147+
{{% alert title="Note" %}}
148+
149+
Adjust the configuration to match the specific telemetry you intend to collect
150+
as well as where you plan to forward it to. This configuration snippet is a
151+
basic example designed for collecting host metrics that get forwarded to
152+
Prometheus.
153+
154+
{{% /alert %}}
155+
156+
The previous configuration would provision the OpenTelemetry Collector to
157+
collect metrics from the Linux host.
158+
159+
## Running the Ansible playbook
160+
161+
Deploy the OpenTelemetry Collector across your hosts by running the following
162+
command:
163+
164+
```sh
165+
ansible-playbook deploy-opentelemetry.yml
166+
```
167+
168+
## Check your metrics in the backend
169+
170+
After your OpenTelemetry Collectors start sending metrics to Prometheus, follow
171+
these steps to visualize them in Grafana:
172+
173+
### Set up Grafana
174+
175+
1. **Install Docker**: Make sure Docker is installed on your system.
176+
177+
2. **Run Grafana Docker Container**: Start a Grafana server with the following
178+
command, which fetches the latest Grafana image:
179+
180+
```sh
181+
docker run -d -p 3000:3000 --name=grafana grafana/grafana
182+
```
183+
184+
3. **Access Grafana**: Open <http://localhost:3000> in your web browser. The
185+
default login username and password are both `admin`.
186+
187+
4. **Change passwords** when prompted on first login -- pick a secure one!
188+
189+
For other installation methods and more detailed instructions, refer to the
190+
[official Grafana documentation](https://grafana.com/docs/grafana/latest/#installing-grafana).
191+
192+
### Add Prometheus as a data source
193+
194+
1. In Grafana, navigate to **Connections** > **Data Sources**.
195+
2. Click **Add data source** and select **Prometheus**.
196+
3. In the settings, enter your Prometheus URL, for example,
197+
`http://<your_prometheus_host>`, along with any other necessary details.
198+
4. Select **Save & Test**.
199+
200+
### Explore your metrics
201+
202+
1. Go to the **Explore** page
203+
2. In the Query editor, select your data source and enter the following query
204+
205+
```PromQL
206+
100 - (avg by (cpu) (irate(system_cpu_time{state="idle"}[5m])) * 100)
207+
```
208+
209+
This query calculates the average percentage of CPU time not spent in the
210+
"idle" state, across each CPU core, over the last 5 minutes.
211+
212+
3. Explore other metrics and create dashboards to gain insights into your
213+
system's performance.
214+
215+
This blog post illustrated how you can configure and deploy multiple
216+
OpenTelemetry Collectors across various Linux hosts with the help of Ansible, as
217+
well as visualize collected telemetry in Grafana. Incase you find this useful,
218+
GitHub repository for
219+
[OpenTelemetry Collector role](https://github.com/grafana/grafana-ansible-collection/tree/main/roles/opentelemetry_collector)
220+
for detailed configuration options. If you have questions, You can connect with
221+
me using my contact details at my GitHub profile
222+
[@ishanjainn](https://github.com/ishanjainn).

static/refcache.json

+19-3
Original file line numberDiff line numberDiff line change
@@ -811,6 +811,10 @@
811811
"StatusCode": 206,
812812
"LastSeen": "2024-01-30T16:07:39.690877-05:00"
813813
},
814+
"https://docs.ansible.com/ansible/latest/collections/grafana/grafana/": {
815+
"StatusCode": 206,
816+
"LastSeen": "2024-03-19T11:21:52.991213698Z"
817+
},
814818
"https://docs.appdynamics.com/latest/en/application-monitoring/appdynamics-for-opentelemetry": {
815819
"StatusCode": 200,
816820
"LastSeen": "2024-01-18T08:51:22.195056-05:00"
@@ -2595,6 +2599,10 @@
25952599
"StatusCode": 200,
25962600
"LastSeen": "2024-01-30T16:14:36.112572-05:00"
25972601
},
2602+
"https://github.com/ishanjainn": {
2603+
"StatusCode": 200,
2604+
"LastSeen": "2024-03-19T11:21:47.871135724Z"
2605+
},
25982606
"https://github.com/jack-berg": {
25992607
"StatusCode": 200,
26002608
"LastSeen": "2024-01-18T20:04:54.949867-05:00"
@@ -4489,15 +4497,19 @@
44894497
},
44904498
"https://grafana.com/docs/alloy/latest/": {
44914499
"StatusCode": 200,
4492-
"LastSeen": "2024-04-10T00:09:47.949842+02:00"
4500+
"LastSeen": "2024-04-12T20:40:28.798266582Z"
44934501
},
44944502
"https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/setup/instrument/dotnet/": {
44954503
"StatusCode": 200,
4496-
"LastSeen": "2024-04-10T00:09:50.125651+02:00"
4504+
"LastSeen": "2024-04-12T20:40:30.368448693Z"
44974505
},
44984506
"https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/setup/instrument/java/": {
44994507
"StatusCode": 200,
4500-
"LastSeen": "2024-04-10T00:09:55.400731+02:00"
4508+
"LastSeen": "2024-04-12T20:40:34.652514906Z"
4509+
},
4510+
"https://grafana.com/docs/grafana/latest/#installing-grafana": {
4511+
"StatusCode": 200,
4512+
"LastSeen": "2024-04-12T20:40:33.435682362Z"
45014513
},
45024514
"https://grafana.com/oss/opentelemetry/": {
45034515
"StatusCode": 200,
@@ -7811,6 +7823,10 @@
78117823
"StatusCode": 200,
78127824
"LastSeen": "2024-01-19T09:04:05.862693+01:00"
78137825
},
7826+
"https://www.ansible.com/": {
7827+
"StatusCode": 200,
7828+
"LastSeen": "2024-03-19T11:21:48.883430689Z"
7829+
},
78147830
"https://www.apollographql.com/docs/federation/": {
78157831
"StatusCode": 206,
78167832
"LastSeen": "2024-01-18T19:55:56.349642-05:00"

0 commit comments

Comments
 (0)