-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to using agentbeat versus all the other beats from the beats repository #4516
Conversation
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Sonarqube is wrong all the changes in this PR are fully covered by unit tests, I have confirmed it with my own coverage report review. I assume its counting the magefile contents in its percentage. |
CI is broken so I am unable to get a green CI run on this PR, but I do see in the CI jobs that the elastic-agent was built with agentbeat. That is a great sign that it was built with it and was wanting to test elastic-agent with the agentbeat. |
LGTM pending the tests passing |
@cmacknz I had to make some changes to allow the elastic-agent-cloud image to be built. I have tested that it works and allows filebeat and metricbeat to be executed from there original paths all transparently running agentbeat. Probably would be good to get someone from cloud to review this change, I don't know who to ask. |
Best thing to do for cloud is to test it by creating a deployment from your branch. https://github.com/elastic/elastic-agent?tab=readme-ov-file#testing-on-elastic-cloud Make sure this branch has eca5bc7 or it will fail for unrelated reasons. There is also possibly a fleet-server issue you may see that is unrelated, that won't be fixed until the agentbeat CI fix is merged. |
buildkite test it |
The runtime leak tests are failing and might need some adjustment, it is trying to get the beat stats directly from each component, and now all the components are the same process. |
Very strange. I would have expected that to work the same, as I tried to ensure it didn't adjust any naming or paths. Trying to figure out what the issue is. Working on it. |
I see this in the logs, not sure if it's the cause but it's going to break the monitoring data: "component":{"binary":"agentbeat","dataset":"elastic_agent.agentbeat","id":"log-default","type":"log"} The dataset I can see several "cannot index event" errors I believe are related: {"log.level":"warn","@timestamp":"2024-04-17T17:46:53.872Z","message":"Cannot index event (status=403): dropping event! Enable debug logs to view the event and cause.","component":{"binary":"agentbeat","dataset":"elastic_agent.agentbeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"ecs.version":"1.6.0","log.logger":"elasticsearch","log.origin":{"file.line":429,"file.name":"elasticsearch/client.go","function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).bulkCollectPublishFails"},"service.name":"filebeat","ecs.version":"1.6.0"} |
I am seeing something a little different:
|
Something is definitely up with the monitoring sockets. There are 4 unique PIDs but only 3 unix sockets:
|
The filestream monitoring isn't getting a monitoring unix socket, and it appears as a component in the output of
|
I still see attempts to index to the non-existant {"log.level":"warn","@timestamp":"2024-04-18T00:44:26.369Z","message":"Cannot index event (status=403): dropping event! Enable debug logs to view the event and cause.","component":{"binary":"agentbeat","dataset":"elastic_agent.agentbeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"elasticsearch","log.origin":{"file.line":429,"file.name":"elasticsearch/client.go","function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).bulkCollectPublishFails"},"ecs.version":"1.6.0"} The list of data streams for monitoring is hard coded into Fleet, we need a Kibana change for this to be accepted. Regardless of the permission issue, we need to use the correct beat alias (filebeat, metricbeat, etc) to avoid breaking all of the elastic_agent dashboards. |
The logs ingestion test should be failing as well giving us a stronger hint to the problem, but it only looks for metricbeat logs: elastic-agent/testing/integration/logs_ingestion_test.go Lines 246 to 253 in d44fe9f
The unix socket connection errors are also in the agent logs: {"log.level":"error","@timestamp":"2024-04-18T00:50:06.000Z","message":"Error fetching data for metricset beat.stats: error making http request: Get \"http://unix/stats\": dial unix /Library/Elastic/Agent/data/tmp/xTEtpJ7117ppc6OYvJCaYHbDW8mLjXGe.sock: connect: no such file or directory","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log":{"source":"beat/metrics-monitoring"},"ecs.version":"1.6.0","log.origin":{"file.line":256,"file.name":"module/wrapper.go","function":"github.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).fetch"},"service.name":"metricbeat","ecs.version":"1.6.0"} The error detection logic should also be failing here but I don't think it waits long enough to see all possible errors: https://github.com/blakerouse/elastic-agent/blob/d44fe9fac2a56d529ddf30833fd2b70535588249/testing/integration/logs_ingestion_test.go#L266-L269 |
@cmacknz The issue is actually in the |
PR to fix the specification - elastic/beats#39026 |
buildkite test this |
After elastic/beats#39026 has been merged, @pchila ran tests locally and the failure is fixed. |
|
What does this PR do?
Switches to used the
agentbeat
that comes from this PR: elastic/beats#38183Why is it important?
Reduces the size of the shipped Elastic Agent by almost 50%
Checklist
./changelog/fragments
using the changelog toolHow to test this PR locally
Related issues
Logs
Component logs are even correct and still showing
metricbeat
orfilebeat
when the binary being executed is actualagentbeat metricbeat
andagentbeat filebeat
respectively.