-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nats-server (started as windows service) during lengthy jetstream recovery never reaches running state and is stuck in a restart loop. #6313
Labels
defect
Suspected defect such as a bug or regression
Comments
here is an example of such a restart loop. The first start shows a
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Observed behavior
We noticed nats-server started as windows service are in a stop/restart loop when jetstream recovery takes too long and hits windows service startup timeouts.
The reason appears to be that ReadyForConnections only returns after recovery is over.
Depending on hdd used as well as the amount of data stored, recovery time may vary quite a bit.
In our setup we have observed recovery times > 8m.
We did set
NATS_STARTUP_DELAY
to 15m in order to rule out interference that way.It appears that windows service startup timeout stops the process, causing recover to happen again on the next startup.
Expected behavior
This problem wouldn't happen if the server status would be set to
svc.Running
before recovery starts.Alternatively, if during recovery/ReadyForConnections
svc.Interrogate
would be handled and responded with an increasedCheckPoint
and specifiedWaitHint
, windows wouldn't decide the service is un-responsive and determine to stop it.I looked at a bunch of go windows service examples and none seem to be dealing with/considering a lengthy startup.
However, the windows docs do mention what to do.
Windows docs
Golang windows svc
Server and client version
last time observed nats-server 2.10.20 was used. But the code in question has not been modified for a while, so I don't expect this to work with more recent versions either.
Host environment
No response
Steps to reproduce
No response
The text was updated successfully, but these errors were encountered: