-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux docker agent gets Unhealthy on adding linux integration. #2377
Comments
@manishgupta-qasource Please review. |
Secondary review for this ticket is Done |
@amolnater-qasource Looks like we faced some permission issues: @fearful-symmetry does it ring a bell or should I ask the obs-service team to look at this specific integration first? |
@jlind23 There could be a few issues here; The original issue mentions docker, so it's possible that we need to set |
@amolnater-qasource could you please check what @fearful-symmetry said? On a side note, was this particular docker distribution working before now? |
Thank you for looking into this issue. We observed Further, this issue was earlier observed during 8.5.0 SNAPSHOT testing, reported under #1454 Please let us know if we are missing anything here. |
@amolnater-qasource can't you ssh in this container and see if it is mounted? Are you relying on a different base docker image? |
Ah, brain skipped a beat, just noticed that it's actually a permissions error: I'm fairly certain that |
Hi @jlind23 For testing the docker agent we followed below steps:
So, as per our understanding we aren't creating any new container for this and we are using this docker image for installing an agent. Please let us know if we are missing anything here. |
@fearful-symmetry would be great to have your eyes on this as soon as you have time to make sure this is not a regression we introduced in metricbeat. |
Hi Team, We have revalidated this issue on latest 8.8 BC6 Kibana cloud environment and found it still reproducible. Observations:
Logs: Build details:
Please let us know if anything else is required from our end. Thanks! |
This is a new error in the system metrics input:
|
@amolnater-qasource can you try to reproduce? I want to see if this error happens every time or is intermittent to assess the severity of the problem. |
@cmacknz normally that error would be thrown by the |
Hi @cmacknz Thank you for looking into this. The issue is reproducible everytime the linux integration with all datasets enabled is added to the agent policy. Agents: Host OS's: Build details:
Logs: Please let us know if anything else is required from our end. Thanks! |
@fearful-symmetry yes this is supported, we support both Ubuntu 22 and Google container optimized OS on ARM64 per https://www.elastic.co/support/matrix
Raising priority, adding to the next sprint since this happens every time. |
Going to look into this more tomorrow, but what I think is happening is that because we're running in a container, the dbus socket for the host isn't reachable inside the container. Pretty sure there's an environment variable we can set that's used by the coreos libraries. I don't think this is documented anywhere, which is a bit of a problem. |
Thanks @fearful-symmetry for looking into this. If you assumption is right, putting a doc PR would definitely be enough for this. |
@amolnater-qasource Can you try:
|
Thank you for sharing the details over slack and helping us revalidating this. Please find below details for the attempted test:
We observed that the installed agent is Unhealthy and had below errors: Agent Logs: Please let us know if anything else is required from our end. |
Update while I look into this: I think there's some kind of formatting issue with the env var happening between the |
Alright, found the issue, extremely dumb bug. There's two different versions of the Fix is here: elastic/beats#35618 |
Hi Team, We have revalidated this issue on latest 8.9.0 BC3 Kibana cloud environment and found it still reproducible. Observations:
Build details:
Screen Recording: 94504372f98a.-.Agents.-.Fleet.-.Elastic.-.Google.Chrome.2023-07-11.10-56-46.mp4Agents.-.Fleet.-.Elastic.-.Google.Chrome.2023-07-11.11-06-29.mp4Logs: Hence, we are reopening this issue. |
@fearful-symmetry could you please have a look? |
Seems like this is dbus again: - id: system/metrics-default
state:
state: 2
message: 'Healthy: communicating with pid ''31'''
units:
? unittype: 0
unitid: system/metrics-default-system/metrics-system-331804e9-c84e-40e0-beae-805672378572
: state: 4
message: '[failed to reload inputs: 2 errors: Error creating runner from config:
1 error: error connecting to dbus: dial unix /var/run/dbus/system_bus_socket:
connect: no such file or directory; Error creating runner from config: 1
error: error connecting to dbus: error getting connection to system bus:
dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory]'
? unittype: 0 elastic/beats#35618 was supposed to fix this I believe. |
@amolnater-qasource is that the exact docker command? If you're using the dbus-related metricsets you need to add I suspect this isn't well documented; going to hunt around the system docs and see if I can find where we should put this. |
Alright, tested with
Seems to work fine. |
Closing this as fixed then and I approved your doc Pr. |
Thank you for the confirmation and adding the docs. We have re-attempted to install agent on docker with below updated commands:
Second:
Screen Recording: Agents.-.Fleet.-.Elastic.-.Google.Chrome.2023-07-12.10-09-22.mp4For troubleshooting we also tried adding below config to linux integration. However, the agent still remained Unhealthy. Logs: Please let us know if we are missing anything here. Thank you |
A little baffled by this, since I'm seeing tons of errors that seem to suggest that the We might want to take care to create the policy with |
Thank you for looking into this again.
For getting the logs we have reattempted with two different set of commands for running agent:
Debug Logs for this agent are: Second Command:
Agent logs for this agent are: Please let us know if we are missing anything here. Thanks! |
Ah, there we go:
It looks like AppArmor is stopping the dbus Hello message, which isn't something I think I've ever seen before. @amolnater-qasource can you tell me precisely what ubuntu release this is so I can try and document some kind of workaround? The output of |
Huzzah, was able to reproduce this. Interestingly, this only seems to happen with docker, which is probably why we haven't seen this before. |
So, we can temporarily work around this by adding
This doesn't seem like the best solution, and I'd like to come up with a more targeted apparmor role. |
@amolnater-qasource Is this still an issue you face? |
Hi @jlind23 We have revalidated this issue on latest 8.14.0 BC5 kibana cloud environment and found it still reproducible with the actual command:
Observations:
Agent Logs: We were expecting this to fix as per #2377 (comment) Please let us know if anything else is required from our end. Thanks! |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Yes this is the same error originally detected in #2377 (comment). |
Kibana version: 8.7 BC6 Kibana cloud environment
Host OS:
Ubuntu 22 ARM64
Build details:
VERSION: 8.7 BC6
BUILD: 61051
COMMIT: 04ef24287f26854ad99a46ae983854c6184717cb
Preconditions:
Steps to reproduce:
Note:
Expected Result:
Docker agent should remain healthy on adding linux integration.
Screen Recording:
Agents.-.Fleet.-.Elastic.-.Google.Chrome.2023-03-16.23-07-34.mp4
Logs:
elastic-agent-diagnostics-2023-03-16T17-37-58Z-00.zip
elastic-agent-diagnostics-2023-03-16T17-43-24Z-00.zip
The text was updated successfully, but these errors were encountered: