Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[windows] capture early start-up errors #4627

Closed
leehinman opened this issue Apr 26, 2024 · 8 comments · Fixed by #4846
Closed

[windows] capture early start-up errors #4627

leehinman opened this issue Apr 26, 2024 · 8 comments · Fixed by #4846
Assignees
Labels
Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@leehinman
Copy link
Contributor

Describe the enhancement:

Capture standard error of elastic-agent when run as a Windows service. This isn't necessary under Linux since systemd will capture the standard error.

Describe a specific use case for the enhancement or feature:

There are several steps that happen before internal logging is started. If elastic-agent fails to start before internal logging is started the only place the error can be found is on the standard error of the elastic-agent process. When elastic-agent is run as a service under Windows, the standard error is not captured. This is a problem because elastic-agent can fail to start and there is no record of the failure. Adding this enhancement will allow us to "see" the error.

What is the definition of done?

When elastic-agent fails as a service under Windows the standard error of the process can be retrieved.

@leehinman leehinman added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels Apr 26, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz
Copy link
Member

cmacknz commented May 8, 2024

For Windows, Lee suggested the best way to do this would be to have agent write to the windows event log.

@strawgate
Copy link
Contributor

Does this run the risk of us creating a logging loop? Agent logs to Event Viewer, Winlogbeat reads Event Viewer, Winlogbeat crashes, Agent logs the crash to event viewer, winlogbeat reads the crash, etc....

@cmacknz
Copy link
Member

cmacknz commented May 8, 2024

We could narrow the scope of when we use the event log to only the period before we have our JSON logger setup and know the location those log files should be written to on disk.

This will encompass a point in time where no subprocess (like winlogbeat) are running, and wouldn't run the risk of a logging loop. Winlogbeat could read a record of previous agent crashes, but by the time winlogbeat is running agent is no longer writing to the event log.

We have similar problems with the monitoring filestream instance that needs some special handling and processors, I don't think we want to deal with any of that for regular uses of winlogbeat.

@leehinman
Copy link
Contributor Author

We could narrow the scope of when we use the event log to only the period before we have our JSON logger setup and know the location those log files should be written to on disk.

I think we can make it even more limited in scope. If we just add writing an EventLog at

fmt.Fprintf(streams.Err, "Error: %v\n%s\n", err, troubleshootMessage())

Then we only log to the EventLog if run fails and the error message would contain the info we would normally get on stderr if you were running from the command line. So at most we would only write one EventLog message, and that would only be if elastic-agent run failed.

@blakerouse
Copy link
Contributor

I hit this before with WIndows and added this in the unprivileged work for Windows - https://github.com/elastic/elastic-agent/blame/main/internal/pkg/agent/cmd/run.go#L145

It doesn't cover all cases where it could fail, but it does a much better job then it did before. Logging to the Windows Event Log in the worse case scenario would be nice to have.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label May 21, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants