Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart snapshotter gracefully even if there are running containers #151

Open
luodw opened this issue Sep 22, 2020 · 8 comments
Open

Restart snapshotter gracefully even if there are running containers #151

luodw opened this issue Sep 22, 2020 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@luodw
Copy link

luodw commented Sep 22, 2020

recently, I have tried containerd lazy load feature. stargz-snapshotter use fusefs to fetch file data on demand, and in fuse userspace handler, if file is cached locally, it will return directly, if not cached locally, it will fetch from docker registry.

my question is, if stargz-snapshotter restart abnormally or when upgrade, the fuse connections will break, so container process read will failed. Is there some good practice?
image

@ktock
Copy link
Member

ktock commented Sep 23, 2020

@luodw Thanks for the question! Though we have graceful shutdown on SIGINT (#26), recovery on abnormal shutdown / support for service restart are in progress (#134). Very welcome for contribution.

@luodw luodw closed this as completed Sep 23, 2020
@luodw luodw reopened this Sep 23, 2020
@luodw
Copy link
Author

luodw commented Sep 23, 2020

@luodw Thanks for the question! Though we have graceful shutdown on SIGINT (#26), recovery on abnormal shutdown / support for service restart are in progress (#134). Very welcome for contribution.

Thanks for your reply, I got it.

@ktock
Copy link
Member

ktock commented Sep 24, 2020

@luodw Can you check if the master version (contains the patch #134) fixes this issue?

@luodw
Copy link
Author

luodw commented Sep 24, 2020

@luodw Can you check if the master version (contains the patch #134) fixes this issue?

I hava tried the latest master branch (containes the patch #134 ), but when I 'kill -9 ', and restart right now, the container still has err
image

The follow steps reproduce the issue

  1. ctr-remote images rpull docker.io/stargz/golang:1.12.9-esgz
  2. ctr-remote run --rm -t --snapshotter=stargz docker.io/stargz/golang:1.12.9-esgz test /bin/bash
  3. kill -9 and restart right now
  4. run some commands in container

@ktock
Copy link
Member

ktock commented Sep 25, 2020

Currently, you need to re-run containers too. And I agree with that the snapshotter needs to be able to gracefully restart even if there are running containers.

@ktock ktock changed the title If stargz-snapshotter daemon process restart? Restart snapshotter gracefully even if there are running containers Sep 25, 2020
@ktock ktock added the enhancement New feature or request label Sep 25, 2020
@luodw
Copy link
Author

luodw commented Sep 25, 2020

Currently, you need to re-run containers too. And I agree with that the snapshotter needs to be able to gracefully restart even if there are running containers.

Ok,I also think the ideal usage is when snapshotter restarts, the running containers can still run normally.

@ktock ktock added the help wanted Extra attention is needed label Oct 7, 2020
@amrmahdi
Copy link

@ktock can you describe what is required to do an update/restart to the snapshotter in a running cluster for instance? How do you do that today?

@ktock
Copy link
Member

ktock commented Jan 15, 2021

Currently, we need to kill all containers running on that node before restarting this snapshotter and re-deploy these containers after the snapshotter restarts.

One of the idea to solve this issue is spawning the FUSE server as a separated process instead of goroutine as done today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants