{agent} health status

The {agent} monitoring documentation describes the features available through the {fleet} UI for you to view {agent} status and activity, access metrics and diagnostics, enable alerts, and more.

For details about how the {agent} status is monitored by {fleet}, including connectivity, check-in frequency, and similar, see the following:

How does {agent} connect to the {fleet} to report its availability and health, and receive policy updates?
We use stack monitoring to monitor the status of our cluster. Is monitoring of {agent} and the status shown in {fleet} using stack monitoring as well?
There are many components that make up {agent}. How does {agent} ensure that these components/processes are up and running, and healthy?
If {agent} goes down, is an alert generated by {fleet}?
How long does it take for {agent} to report a status change?

How does {agent} connect to the {fleet} to report its availability and health, and receive policy updates?

After enrollment, {agent} regularly initiates a check-in to {fleet-server} using HTTP long-polling ({fleet-server} is either deployed on-premises or deployed as part of {es} in {ecloud}).

The HTTP long-polling request is kept open until there’s a configuration change that {agent} needs to consume, an action that is sent to the agent, or a 5 minute timeout has elapsed. After 5 minutes, the agent will again send another check-in to start the process over again.

The frequency of check-ins can be configured to a new value with the condition that it may affect the maximum number of agents that can connect to {fleet}. Our regular scale testing of the solution doesn’t modify this parameter.

We use stack monitoring to monitor the status of our cluster. Is monitoring of {agent} and the status shown in {fleet} using stack monitoring as well?

No. The health monitoring of {agent} and its inputs, as reported in {fleet}, is done completely outside of what stack monitoring provides.

There are many components that make up {agent}. How does {agent} ensure that these components/processes are up and running, and healthy?

{agent} is essentially a supervisor that (at a minimum) will deploy a {filebeat} instance for log collection and a {metricbeat} instance for metrics collection from the system and applications running on that system. As a supervisor, it also ensures that these spawned processes are running and healthy. Using gRPC, {agent} communicates with the underlying processes once every 30 seconds, ensuring their health. If there’s no response, the agent will transfer to being Unhealthy with the result and details reported to {fleet}.

If {agent} goes down, is an alert generated by {fleet}?

No. Alerts would have to be created in {kib} on the indices that show the total count of agents at each specific state. Refer to [fleet-alerting] in the {agent} monitoring documentation for the steps to configure alerting. Generating alerts on status change on individual agents is currently planned for a future release.

How long does it take for {agent} to report a status change?

Some {agent} states are reported immediately, such as when the agent has become Unhealthy. Some other states are derived after a certain criteria is met. Refer to [view-agent-status] in the {agent} monitoring documentation for details about monitoring agent status.

Transition from an Offline state to an Inactive state is configurable by the user and that transition can be fine tuned by Setting the inactivity timeout parameter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-health-status.asciidoc

agent-health-status.asciidoc

{agent} health status

How does {agent} connect to the {fleet} to report its availability and health, and receive policy updates?

We use stack monitoring to monitor the status of our cluster. Is monitoring of {agent} and the status shown in {fleet} using stack monitoring as well?

There are many components that make up {agent}. How does {agent} ensure that these components/processes are up and running, and healthy?

If {agent} goes down, is an alert generated by {fleet}?

How long does it take for {agent} to report a status change?

Files

agent-health-status.asciidoc

Latest commit

History

agent-health-status.asciidoc

File metadata and controls

{agent} health status

How does {agent} connect to the {fleet} to report its availability and health, and receive policy updates?

We use stack monitoring to monitor the status of our cluster. Is monitoring of {agent} and the status shown in {fleet} using stack monitoring as well?

There are many components that make up {agent}. How does {agent} ensure that these components/processes are up and running, and healthy?

If {agent} goes down, is an alert generated by {fleet}?

How long does it take for {agent} to report a status change?