-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent does not apply an invalid proxy for fleet-server, but shows as unhealthy #4472
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Are we sure the config is not applied for the fleet-server part? |
I could check again, but yes, the agent was not applying the config. A simple test is to reproduce the issue and fix the proxy in the policy and observe the agent will report as health again |
Why is the Fleet status healthy but the agent status isn't? The reason we use a separate Fleet status in the first place was so we'd stop considering transient Fleet errors a reason why the agent would be unhealthy (and if the agent is offline, it can't report Fleet status anyway). The error appears to be coming from: elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_policy_change.go Lines 192 to 197 in d558694
I think that function might be globally setting the agent status regardless of where it was called from: elastic-agent/internal/pkg/remote/client.go Lines 209 to 213 in d558694
|
I thought it was global-ish error state for the fleetclient but perhaps it isn't. As you pointed out, the flle status is healthy, what is correct. And paying more attention at the error, it startes with I had a quick look at the code, and I believe here is where the error is collected and set on the agent status elastic-agent/internal/pkg/agent/application/coordinator/coordinator_state.go Lines 201 to 203 in ad7e1b5
|
What clears that error once it is set? Another successful action? |
@cmacknz, IIRC, yes, a successful action would clear the error. @pierrehilbert @cmacknz it's still relevant right? |
I would say this is very relevant. Perhaps even related to this: https://github.com/elastic/ingest-dev/issues/3234 |
@nimarezainia, what do you mean by informing the user before the config is applied? I'm wondering if you mean some how test it before sending to the agents. |
I believe Nima is referring to two-phase commit protocol which i don't think we want to focus on right now.. basically all agents report back to fleet server that a new config is valid (eg. "prepare"), and only then the "commit" phase happens where all agents apply the new config. |
Yes a two commit would work. Many of these configs (as @AndersonQ stated) would need to be tested at the agent itself. I am thinking mainly of connectivity related configurations, like the connection to Fleet Server, Outputs or the Download, before that config is applied, test whether you even have a route to the endpoint. Then apply/commit the configuration. If the test fails, don't change the config and flag this. We don't want a small mistake in the configuration to bring down the whole Fleet. |
Steps to Reproduce:
The status does eventually clear if you delete the incorrect proxy.
The text was updated successfully, but these errors were encountered: