-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent/beats grpc comms over domain socket/named pipe #4249
Conversation
This pull request does not have a backport label. Could you fix it @aleksmaus? 🙏
NOTE: |
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Co-authored-by: Leszek Kubik <39905449+intxgo@users.noreply.github.com>
Talked to @jrmolin, one issue left is the connection info discovery is still as defined in the endpoint config an IP port We could potentially allow some base path in the endpoint spect for domain socket/named pipe for connection info API. Thinking what other options we could use within the current implementation. Thoughts? |
It seems simpler to just align this change in both endpoint and agent rather than introduce complexity to deal with this that we would remove as soon as endpoint supports domain sockets. The endpoint version is always aligned with the agent version so we ideally wouldn't have to release a version where endpoint doesn't know what to expect.
We could use a fixed base path for the unix socket connection instead of the a fixed port. The socket/pipe would be owned by root/admin so it is already more secure than an arbitrary port. |
I made this mode configurable, there might be the cases where you might want to use IP socket, I suspect, like possibly running agent and beats in the separate pods. So the Endpoint would NOT know in advance if IP or domain socket is used for RPC. |
Looking at the code we can use the The Go based inputs all need to be updated to https://github.com/elastic/elastic-agent-client/releases/tag/v7.8.0 first which shouldn't be that hard to coordinate, endpoint is slightly more complicated. Thinking more about this it would actually be best if endpoint could handle either a TCP port or a domain socket/pipe. This way we can keep the configuration and if we hit an unexpected problem we have the ability to easily go back. So I'd say the path forward is:
|
Yes it is also a way to run the agent twice on the same machine since you can more easily change the port than the unix socket location. |
Endpoint will not install when the Elastic Agent is not installed in the default location: https://github.com/elastic/elastic-agent/blob/main/specs/endpoint-security.spec.yml#L21 This is because Endpoint protects the Elastic Agent directory and its hard coded to that directory. In that case the Elastic Agent should just create a domain unix socket inside of its default installation directory for Endpoint to connect to and get the connection information. Windows is even easier it should just use a defined npipe. I also agree that Endpoint should check the unix socket/npipe first and then fallback checking for the network port for the information. I also prefer that this just become the default. |
Taking into account all the basic hardening we've added to Endpoint recently (policy signing, Tamper Protection), we ended with a flow:
so for this question "how to tell Endpoint how it should talk to Agent" it's a matter of the bootstrap phase 1-4. Maybe the npipe/socket should be a plain Endpoint command line install parameter? I assume once Endpoint is configured with a given comms method it won't change. |
This is configurable, so it's possible to change, don't know why would anybody do that, but possible. |
ok. Then it's not so straightforward. Maybe we could define more constrains, like it's possible to change the comms method but it will require Endpoint re-install? The latter would ensure that sufficient prerequisites are met to actually be able to uninstall Endpoint. |
Changing the comms method will require every component to restart at minimum I think. I don't love the complexity this brings however I also don't like the idea of changing this without giving users who experience unexpected problems with it an escape hatch. We could require an agent restart but there's no way to initiate this from Fleet right now. Perhaps the agent automatically re-execs itself whenever this changes in the configuration. One thing that isn't touched in the PR yet is that the bootstrap process for endpoint today also involves local TCP. We should change this to also use a unix/named pipe at a fixed location instead of a fixed TCP port so that you have to be root to connect to it. elastic-agent/pkg/component/runtime/conn_info_server.go Lines 43 to 49 in d075097
The process today is:
We want both the bootstrapping and the control protocol gRPC address to be unix/named pipes. As Blake suggests for the bootstrapping we can just used a fixed location for the bootstrapping pipe. That can then communicate the address of the gRPC pipe. We could also try to eliminate the need to bootstrap endpoint entirely since we can use a fixed pipe address for gRPC, but we'd still need a way to give endpoint the certs for mTLS. |
Yes, that's one of the remaining things for this feature to complete it and is on my TODO list. |
The C++ grpc library that Endpoint uses doesn't support domain sockets or named pipes. There's a longstanding FR, but it's not implemented. We don't think we can land full gRPC-over-docket/npipe in 8.14, but we do think we think we can land the updated bootstrap. Would it be possible for Agent to support the new comms for bootstrap, while maintaining localhost gRPC TCP for Endpoint in 8.14? |
This would require some agent code rework, don't know the scope of changes yet, need to dig more. |
|
There are security benefits to using unix sockets/named pipes only for the bootstrap portion and leaving the rest unchanged. So there is value in doing this. I wouldn't bother using TCP only for gRPC communication with endpoint, we need to switch all the components over at once. I don't think we want to deal with two gRPC servers. I would think of the endpoint bootstrap improvement as a separate change from the change to the control protocol server. They are two different parts of agent. |
…ction info server.
Added the commit that uses TCP gRPC for comms and forces domain socket/named pipe for connection info server. @blakerouse I could not rely on the gRPC Port config (-1) in this case, since we were asked to keep gRPC on TCP socket. And only use domain socket for connection info server. With this change the connection info server is always forced to use domain socket. |
We have been using unix sockets and named pipes without issue for collecting monitoring information from the Beats that run as part of agent for a long time (it is also used for the control socket) so the risk overall with this change is low. The scope of changing just the bootstrap process is also lower. Generally the only problems we've seen are hitting the arbitrary OS limits on the lengths of the names for the pipes. I also think the security benefits of this change are not as strong if we add a configuration flag to undo it. So for now I'd say we don't need a configuration flag for this as long as we can align it with endpoint merging the change at the same time. I could be argued into changing this opinion. |
I agree. |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
👋 @aleksmaus, just wondering what your plans are for this PR, given the last activity on it was just over a month ago? Do you want to keep iterating on it or should we close it out? |
This PR is done and was waiting on Endpoint PR merge in order pass the integration tests. The Endpoint was merged today |
|
What does this PR do?
Implements configurable GRPC allowing the agent to use domain socket/named pipe for comms with beats.
The core of the support for this is already merged in the agent client lib elastic/elastic-agent-client#91 and is tagged as v7.8.0. Beats just work automagically since they already picked up this tag
https://github.com/elastic/beats/blob/main/go.mod#L72
Changes:
Local
GRPC bool flag. Agent still uses IP socket by defaultI tested on all 3 platforms: darwin, linux and winderz.
Will reach out to Endpoint team, since they likely would need to adjust comms on Endpoint side.
This is the first cut, can change based on the feedback.
Open for feedback/opinions on:
Why is it important?
Addresses #4248
Checklist
./changelog/fragments
using the changelog toolRelated issues