Change the default gRPC port to 0 when in a container #6585

cmacknz · 2025-01-23T18:44:04Z

What does this PR do?

Changes the default port used when agent runs in a container to port 0. There is at least one user whose rollout of Elastic Agent is experiencing collisions with another application on the same port (nobody else could ever pick 6789 as a default port surely...) and I thought this would be straight forward to fix quickly. I was wrong. I expected ~2 hours and spent 3 days:

It looked like we already have a separate default config file for Docker containers I could just update:

elastic-agent/_meta/config/elastic-agent.docker.yml.tmpl

Lines 88 to 89 in 9b8a25f

# # port for the GRPC server that spawned processes connect back to.

# port: 6789
This didn't work because the Helm chart defines its own configuration separately. I also didn't want to define this more than once because we'll probably change it again later (see note below):

elastic-agent/deploy/helm/elastic-agent/templates/agent/k8s/_secret.tpl

Line 5 in 9b8a25f

agent.yml: |-
The run() function that is the entrypoint of the agent already has a hook for overriding configuration

elastic-agent/internal/pkg/agent/cmd/run.go

Lines 426 to 428 in 4199196

if override != nil {

override(cfg)

}

that is already used in the container command to override the logging configuration

elastic-agent/internal/pkg/agent/cmd/container.go

Line 769 in 4199196

func logToStderr(cfg *configuration.Configuration) {

but of course updating this to change the gRPC port also didn't work.
It turns out we load the configuration from disk at least twice at startup, once in

elastic-agent/internal/pkg/agent/cmd/run.go

Line 415 in 9b8a25f

cfg, err := configuration.NewFromConfig(rawConfig)

and again in

elastic-agent/internal/pkg/agent/application/application.go

Line 99 in 4199196

cfg, err := configuration.NewFromConfig(rawConfig)

The second use didn't apply the overrides, the overrides only applied for the first case because that's when we created our logger and the only current overrides are for logging 🤦. Probably we could tidy this up but I've already spent enough time on this one.

note We should eventually stop using local TCP at all for the control protocol between sub-processes, and switch to unix sockets / named pipes. The capability for this was added in #4249, but it was left disabled because endpoint-security doesn't support it yet (because the upstream gRPC C++ client doesn't support it on Windows). We have now removed endpoint-security from our containers which would allow us to switch to unix sockets there, but this change required an elastic-agent-client package update and we need to test that every client we have has it first. This was more testing effort than I wanted to take on now, but I will create a follow up issue to do this.

Why is it important?

This automatically avoids port collisions between Elastic Agents using hostNetwork: true on Kubernetes as our DaemonSet does by default and other applications (or other Elastic Agents).

Disruptive User Impact

None, but in case I'm wrong about this I made it possible to choose a specific port with an environment variable.

How to test this PR locally

INSTANCE_PROVISIONER=kind SNAPSHOT=true GOTEST_FLAGS="-test.run TestKubernetesAgentHelm" TEST_PLATFORMS="kubernetes/arm64/1.31.0/basic" mage -v integration:kubernetes

testing/integration/kubernetes_agent_standalone_test.go

elasticmachine · 2025-01-23T18:50:28Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

blakerouse

Looks good.

The day sockets/npipe can be used for all installation types with be a glorious day!

elastic-sonarqube · 2025-01-23T20:10:22Z

Quality Gate passed

Issues
3 New issues
1 Fixed issue
0 Accepted issues

Measures
0 Security Hotspots
52.9% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

pkoutsovasilis

LGTM 👍 this change will help a lot with k8s deployments

* Override the container command's gRPC port to 0 by default. * Test that two containers don't have port collisions on the same host. * Move container override next to regular defaults. * Add changelog. * Fix application.New call in unit test. * Silence lint warning. (cherry picked from commit a61ad8c)

* Override the container command's gRPC port to 0 by default. * Test that two containers don't have port collisions on the same host. * Move container override next to regular defaults. * Add changelog. * Fix application.New call in unit test. * Silence lint warning. (cherry picked from commit a61ad8c) Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>

cmacknz added 3 commits January 23, 2025 12:11

Override the container command's gRPC port to 0 by default.

81a169c

Test that two containers don't have port collisions on the same host.

2694293

Move container override next to regular defaults.

9b8a25f

cmacknz added enhancement New feature or request backport-8.x Automated backport to the 8.x branch with mergify labels Jan 23, 2025

cmacknz self-assigned this Jan 23, 2025

cmacknz requested a review from a team as a code owner January 23, 2025 18:44

cmacknz requested review from andrzej-stencel and michel-laterman January 23, 2025 18:44

Add changelog.

361f6ce

cmacknz commented Jan 23, 2025

View reviewed changes

testing/integration/kubernetes_agent_standalone_test.go Show resolved Hide resolved

cmacknz added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jan 23, 2025

cmacknz added 2 commits January 23, 2025 14:01

Fix application.New call in unit test.

4cef142

Silence lint warning.

dfd0a0f

blakerouse approved these changes Jan 23, 2025

View reviewed changes

pkoutsovasilis approved these changes Jan 23, 2025

View reviewed changes

michalpristas approved these changes Jan 24, 2025

View reviewed changes

cmacknz merged commit a61ad8c into elastic:main Jan 24, 2025
14 checks passed

cmacknz deleted the use-port-zero-in-docker-container branch January 24, 2025 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the default gRPC port to 0 when in a container #6585

Change the default gRPC port to 0 when in a container #6585

cmacknz commented Jan 23, 2025 •

edited

Loading

elasticmachine commented Jan 23, 2025

blakerouse left a comment

elastic-sonarqube bot commented Jan 23, 2025

pkoutsovasilis left a comment

	# # port for the GRPC server that spawned processes connect back to.
	# port: 6789

	if override != nil {
	override(cfg)
	}

Change the default gRPC port to 0 when in a container #6585

Change the default gRPC port to 0 when in a container #6585

Conversation

cmacknz commented Jan 23, 2025 • edited Loading

What does this PR do?

Why is it important?

Disruptive User Impact

How to test this PR locally

elasticmachine commented Jan 23, 2025

blakerouse left a comment

Choose a reason for hiding this comment

elastic-sonarqube bot commented Jan 23, 2025

Quality Gate passed

pkoutsovasilis left a comment

Choose a reason for hiding this comment

cmacknz commented Jan 23, 2025 •

edited

Loading