Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Agent enroll fails to restart daemon on docker #3628

Closed
AndersonQ opened this issue Oct 18, 2023 · 11 comments
Closed

Elastic Agent enroll fails to restart daemon on docker #3628

AndersonQ opened this issue Oct 18, 2023 · 11 comments
Assignees
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@AndersonQ
Copy link
Member

AndersonQ commented Oct 18, 2023

The Elastic Agent fails to restart its daemon during enroll when running from the docker image. Subsequent starts of the docker container succeed.

The culprit is f7e558f

  • Version: main (8.12.0-SNAPSHOT), 8.11.0-SNAPSHOT
  • Operating System: all
  • Discuss Forum URL: N/A
  • Steps to Reproduce:
docker run \
  --env FLEET_ENROLL=1 \
  --env FLEET_URL=https://fleet-url:8220/ \
  --env FLEET_ENROLLMENT_TOKEN=SOME_TOKEN \
  --env FLEET_INSERUCE=true \
  docker.elastic.co/beats/elastic-agent:8.12.0-SNAPSHOT

Some of our tests are failing:

logs:

root@elastic-agent:~# docker run \
  --env FLEET_ENROLL=1 \
  --env FLEET_URL=https://some.fleet.url:port \
  --env FLEET_ENROLLMENT_TOKEN=SOME_TOKE  \
  --env FLEET_INSERUCE=true \
  docker.elastic.co/beats/elastic-agent:8.12.0-SNAPSHOT

{"log.level":"info","@timestamp":"2023-10-18T16:09:56.069Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":497},"message":"Starting enrollment to URL: https://fc2e07ab4001499380ce57a763e698fd.fleet.us-east-1.aws.staging.elastic.cloud:443/","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-10-18T16:10:19.256Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":468},"message":"Retrying to restart...","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-10-18T16:10:59.261Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":468},"message":"Retrying to restart...","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-10-18T16:11:59.262Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":468},"message":"Retrying to restart...","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-10-18T16:12:59.265Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":468},"message":"Retrying to restart...","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-10-18T16:13:59.266Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":468},"message":"Retrying to restart...","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-10-18T16:13:59.266Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":280},"message":"Elastic Agent might not be running; unable to trigger restart: could not reload agent's daemon, all retries failed. Last error: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /usr/share/elastic-agent/state/data/tmp/elastic-agent-control.sock: connect: no such file or directory\"","ecs.version":"1.6.0"}
Something went wrong while enrolling the Elastic Agent: could not reload agent's daemon, all retries failed. Last error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /usr/share/elastic-agent/state/data/tmp/elastic-agent-control.sock: connect: no such file or directory"
Error: could not reload agent daemon, unable to trigger restart: could not reload agent's daemon, all retries failed. Last error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /usr/share/elastic-agent/state/data/tmp/elastic-agent-control.sock: connect: no such file or directory"
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.12/fleet-troubleshooting.html
Error: enrollment failed: exit status 1
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.12/fleet-troubleshooting.html


root@elastic-agent:~# docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


root@elastic-agent:~# docker ps --all
CONTAINER ID   IMAGE                                                   COMMAND                  CREATED         STATUS                      PORTS     NAMES
e07749ec9372   docker.elastic.co/beats/elastic-agent:8.12.0-SNAPSHOT   "/usr/bin/tini -- /u…"   4 minutes ago   Exited (1) 38 seconds ago             happy_visvesvaraya


root@elastic-agent:~# docker start e07749ec9372
e07749ec9372


root@elastic-agent:~# docker logs -f e07749ec9372
{"log.level":"info","@timestamp":"2023-10-18T16:14:49.613Z","log.origin":{"file.name":"cmd/run.go","file.line":155},"message":"Elastic Agent started","log":{"source":"elastic-agent"},"process.pid":7,"agent.version":"8.12.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-10-18T16:14:49.823Z","log.origin":{"file.name":"upgrade/rollback.go","file.line":113},"message":"agent is not upgradable, not starting watcher","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-10-18T16:14:49.823Z","log.origin":{"file.name":"cmd/run.go","file.line":242},"message":"APM instrumentation disabled","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
@AndersonQ AndersonQ added the bug Something isn't working label Oct 18, 2023
@AndersonQ AndersonQ self-assigned this Oct 18, 2023
@cmacknz
Copy link
Member

cmacknz commented Oct 18, 2023

Let's revert the commit that caused this to fix the 8.11 and 8.12 branch quickly while we figure out how to fix this properly.

@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Oct 18, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz
Copy link
Member

cmacknz commented Oct 18, 2023

The original commits causing this have now been reverted.

@pierrehilbert
Copy link
Contributor

I created #3732 to make it easier to test this change.

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ycombinator
Copy link
Contributor

I believe this issue is no longer Blocked. @cmacknz can you confirm?

@cmacknz
Copy link
Member

cmacknz commented Dec 30, 2024

We can both write tests against containers run in Kubernetes and against the container command running on a normal Linux host, so I don't see how this would be blocked.

@swiatekm
Copy link
Contributor

As far as I can tell, this was fixed in #3815, including an e2e test that enrolls a containerized agent into Fleet. Tested manually with 8.16 and it worked correctly there as well. @belimawr @cmacknz let me know if I'm missing something here.

@belimawr
Copy link
Contributor

@belimawr @cmacknz let me know if I'm missing something here.

@AndersonQ originally failed this issue, he can better asses it than I.

@AndersonQ
Copy link
Member Author

yeah, it was fixed, i don't remember why this issue wan't liked to the fix

@swiatekm
Copy link
Contributor

Fixed in #3815.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants