{fleet} provides built-in capabilities for monitoring your fleet of {agent}s. In {fleet}, you can:
Agent monitoring is turned on by default in the agent policy unless you turn it off. Want to turn off agent monitoring to stop collecting logs and metrics? See Change {agent} monitoring settings.
Want to receive an alert when your {agent} health status changes? Refer to Enable alerts and ML jobs based on {fleet} and {agent} status and our alerting example.
For more detail about how agents communicate their status to {fleet}, refer to {agent} health status.
To view the overall status of your {fleet}-managed agents, in {kib}, go to Management → {fleet} → Agents.
Important
|
The Agents tab in {fleet} displays a maximum of 10,000 agents, shown on 500 pages with 20 rows per page. If you have more than 10,000 agents, we recommend using the filtering and sorting options described in this section to narrow the table to fewer than 10,000 rows. |
{agent}s can have the following statuses:
Healthy |
{agent}s are enrolled and checked in. There are no agent policy updates or automatic agent binary updates in progress, but the agent binary may still be out of date. {agent}s continuously check in to the {fleet-server} for required updates. |
Unhealthy |
{agent}s have errors or are running in a degraded state. An agent will be reported as |
Updating |
{agent}s are updating the agent policy, updating the binary, or enrolling or unenrolling from {fleet}. |
Offline |
{agent}s have stayed in an unhealthy status for a period of time. Offline agent’s API keys remain valid. You can still see these {agent}s in the {fleet} UI and investigate them for further diagnosis if required. |
Inactive |
{agent}s have been offline for longer than the time set in your inactivity timeout. These {agent}s are valid, but have been removed from the main {fleet} UI. |
Unenrolled |
{agent}s have been manually unenrolled and their API keys have been removed from the system. You can unenroll an offline {agent} using {agent} actions if you determine it’s offline and no longer valid. These agents need to re-enroll in {fleet} to be operational again. |
The following diagram shows the flow of {agent} statuses:
To filter the list of agents by status, click the Status dropdown and select one or more statuses.
For advanced filtering, use the search bar to create structured queries
using {kibana-ref}/kuery-query.html[{kib} Query Language]. For example, enter
local_metadata.os.family : "darwin"
to see only agents running on macOS.
You can also sort the list of agents by host, last activity time, or version, by clicking on the table headings for those fields.
To perform a bulk action on more than 10,000 agents, you can select the Select everything on all pages button.
In {fleet}, you can access the detailed status of an individual agent and the integrations that are associated with it through the agent policy.
-
In {fleet}, open the Agents tab.
-
In the Host column, click the agent’s name.
On the Agent details tab, the Overview pane shows details about the agent and its performance, including its memory and CPU usage, last activity time, and last checkin message. To access metrics visualizations, you can also View the {agent} metrics dashboard.
The Integrations pane shows the status of the integrations that have been added to the agent policy. Expand any integration to view its health status. Any errors or warnings are displayed as alerts.
To gather more detail about a particular error or warning, from the Actions menu select View agent JSON. The JSON contains all of the raw agent data tracked by Fleet.
Note
|
Currently, the Integrations pane shows the health status only for agent inputs. Health status is not yet available for agent outputs. |
You can view a chronological list of all operations performed by your {agent}s.
On the Agents tab, click Agent activity. All agent operations are shown, beginning from the most recent, including any in progress operations.
When {fleet} reports an agent status like Offline
or Unhealthy
, you might
want to view the agent logs to diagnose potential causes. If agent monitoring
is configured to collect logs (the default), you can view agent logs in {fleet}.
-
In {fleet}, open the Agents tab.
-
In the Host column, click the agent’s name.
-
On the Agent details tab, verify that Monitor logs is enabled. If it’s not, refer to Change {agent} monitoring settings.
-
Click the Logs tab.
On the Logs tab you can filter, search, and explore the agent logs:
-
Use the search bar to create structured queries using {kibana-ref}/kuery-query.html[{kib} Query Language].
-
Choose one or more datasets to show logs for specific programs, such as {filebeat} or {fleet-server}.
-
Change the log level to filter the view by log levels. Want to see debugging logs? Refer to Change the logging level.
-
Change the time range to view historical logs.
-
Click Open in Logs to tail agent log files in real time. For more information about logging, refer to {observability-guide}/tail-logs.html[Tail log files].
The logging level for monitored agents is set to info
by default. You can
change the agent logging level, for example, to turn on debug logging remotely:
-
After navigating to the Logs tab as described in View agent logs, scroll down to find the Agent logging level setting.
-
Select an Agent logging level:
error
Logs errors and critical errors.
warning
Logs warnings, errors, and critical errors.
info
Logs informational messages, including the number of events that are published. Also logs any warnings, errors, or critical errors.
debug
Logs debug messages, including a detailed printout of all events flushed. Also logs informational messages, warnings, errors, and critical errors.
-
Click Apply changes to apply the updated logging level to the agent.
{fleet} provides the ability to remotely generate and gather an {agent}'s diagnostics bundle.
An agent can gather and upload diagnostics if it is online in a Healthy
or Unhealthy
state.
To download the diagnostics bundle for local viewing:
-
In {fleet}, open the Agents tab.
-
In the Host column, click the agent’s name.
-
Select the Diagnostics tab and click the Request diagnostics .zip button.
-
In the Request Diagnostics pop-up, select Collect additional CPU metrics if you’d like detailed CPU data.
-
Click the Request diagnostics button.
When available, the new diagnostic bundle will be listed on this page, as well as any in-progress or previously collected bundles for the {agent}.
Note that the bundles are stored in {es} and are removed automatically after 7 days. You can also delete any previously created bundle by clicking the trash can
icon.
When agent monitoring is configured to collect metrics (the default), you can use the [Elastic Agent] Agent metrics dashboard in {kib} to view details about {agent} resource usage, event throughput, and errors. This information can help you identify problems and make decisions about scaling your deployment.
To view agent metrics:
-
In {fleet}, open the Agents tab.
-
In the Host column, click the agent’s name.
-
On the Agent details tab, verify that Monitor metrics is enabled. If it’s not, refer to Change {agent} monitoring settings.
-
Click View more agent metrics to navigate to the [Elastic Agent] Agent metrics dashboard.
The dashboard uses standard {kib} visualizations that you can extend to meet your needs.
Agent monitoring is turned on by default in the agent policy. To change agent monitoring settings for all agents enrolled in a specific agent policy:
-
In {fleet}, open the Agent policies tab.
-
Click the agent policy to edit it, then click Settings.
-
Under Agent monitoring, deselect (or select) one or both of these settings: Collect agent logs and Collect agent metrics.
-
Under Advanced monitoring options you can configure additional settings including an HTTP monitoring endpoint, diagnostics rate limiting, and diagnostics file upload limits. Refer to configure agent monitoring for details.
-
Save your changes.
To turn off agent monitoring when creating a new agent policy:
-
In the Create agent policy flyout, expand Advanced options.
-
Under Agent monitoring, deselect Collect agent logs and Collect agent metrics.
-
When you’re done configuring the agent policy, click Create agent policy.
You may want to store all of the health and status data about your {agents} in a remote {es} cluster, so that it’s separate and independent from the deployment where you use {fleet} to manage the agents.
To do so, follow the steps in [remote-elasticsearch-output]. After the new output is configured, follow the steps to update the {agent} policy and make sure that the Output for agent monitoring setting is enabled. {agent} monitoring data will use the remote {es} output that you configured.
You can access the health status of {fleet}-managed {agents} and other {fleet} settings through internal {fleet} indices. This enables you to leverage various applications within the {stack} that can be triggered by the provided information. For instance, you can now create alerts and machine learning (ML) jobs based on these specific fields. Refer to the {kibana-ref}/alerting-getting-started.html[Alerting documentation] or see the example on this page to learn how to define rules that can trigger actions when certain conditions are met.
This functionality allows you to effectively track an agent’s status, and identify scenarios where it has gone offline, is experiencing health issues, or is facing challenges related to input or output.
The following datastreams and fields are available.
- Datastream
-
metrics-fleet_server.agent_status-default
This data stream publishes the number of {agents} in various states.
Fields
-
@timestamp
-
fleet.agents.total
- A count of all agents -
fleet.agents.enrolled
- A count of all agents currently enrolled -
fleet.agents.unenrolled
- A count of agents currently unenrolled -
fleet.agents.healthy
- A count of agents currently healthy -
fleet.agents.offline
- A count of agents currently offline -
fleet.agents.updating
- A count of agents currently in the process of updating -
fleet.agents.unhealthy
- A count of agents currently unhealthy -
fleet.agents.inactive
- A count of agents currently inactiveNoteOther fields regarding agent status, based on input and output health, are currently under consideration for future development.
-
- Datastream
-
metrics-fleet_server.agent_versions-default
This index publishes a separate document for each version number and a count of enrolled agents only.
Fields
-
@timestamp
-
fleet.agent.version
- A keyword field containing the version number -
fleet.agent.count
- A count of agents on the specified version
-
You can set up an alert to notify you when one or more {agent}s goes offline:
-
In {kib}, navigate to Management > Stack Management > Rules.
-
Click Create rule.
-
Select Elasticsearch query as the rule type.
-
Choose a name for the rule, for example
Elastic Agent status
. -
Select KQL or Lucene as the query type.
-
Select
DATA VIEW metrics-*
as the data view. -
Define your query, for example:
fleet.agents.offline >= 1
. -
Set the alert group, threshold, and time window. For example:
-
WHEN:
count()
-
OVER:
all documents
-
IS ABOVE:
0
-
FOR THE LAST
5 minutes
This will generate an alert when one or more agents are reported by the
fleet.agents.offline
field over the last five minutes to be offline.
-
-
Set the number of documents to send, for example:
-
SIZE: 100
-
-
Set Check every to the frequency at which the rule condition should be evaluated. The default setting is one minute.
-
Select an action to occur when the rule conditions are met. For example, to set the alert to send an email when an alert occurs, select the Email connector type and specify:
-
Email connector:
Elastic-Cloud-SMTP
-
Action frequency:
For each alert
andOn check intervals
-
Run when:
Query matched
-
To: <the recipient email address>
-
Subject: <the email subject line>
-
-
Click Save.
The new rule will be enabled and an email will be sent to the specified recipient when the alert conditions are met.
From the Rules page you can select the rule you created to enable or disable it, and to view the rule details including a list of active alerts and an alert history.