Necessity of linger on exit for servers that time out #63
Replies: 10 comments 10 replies
-
This man page for socket close suggests it may not be possible through the existing NNG interface: https://nng.nanomsg.org/man/tip/nng_close.3.html
|
Beta Was this translation helpful? Give feedback.
-
In the case of long-running computation, it seems like this would matter most when sending the result of a completed task back to the client, rather than receiving data for a new task. And in the former case, would it be possible for the server to pause its idle timers etc. before initiating a send? Unless I am missing something, it seems like this would just be a matter of expressing the timer logic differently in R. |
Beta Was this translation helpful? Give feedback.
-
The issue is we can do what we like prior to the send, or afterwards for that matter. But we just simply do not know when it has finished. As that is an interplay between the C process and the system TCP stack, that R has no access to at present. |
Beta Was this translation helpful? Give feedback.
-
That makes sense. By the way, this discussion made me concerned that a server could exit and lose the data far before the client has a chance to download it. I am happy to see that lightweight tasks seem to be available somewhere well after the server exits. On my company's cluster, I started a dispatcher on one node: library(mirai)
url <- sprintf("ws://%s:57000", getip::getip())
print(url)
daemons(
n = 1L,
url = url,
dispatcher = TRUE,
token = FALSE
)
while (!is.matrix(daemons()$daemons)) {
Sys.sleep(0.1)
}
while (daemons()$daemons[, "online"] < 1L) {
Sys.sleep(0.1)
}
tasks <- replicate(4, mirai(rnorm(n = 1)))
Sys.sleep(4)
print(as.numeric(lapply(tasks, function(task) task$data))) During the
The server visibly came and went, and the client did not make an attempt to collect the data until a couple seconds after that. But yet no result went missing! print(as.numeric(lapply(tasks, function(task) task$data)))
#> [1] 1.3502759 -0.2049120 0.1465165 -0.5801425 This is really amazing. Where do the results live between the server exit and the moment the client starts to collect them? |
Beta Was this translation helpful? Give feedback.
-
Ha yes TCP is surprisingly resilient.
The send is eager so it is done when the server is still alive. <- This though assumes it finishes transmitting before the 'exitlinger' period and the process dies.
I believe the data is just buffered at the client (listener) TCP socket, so it can be collected at any time by NNG. |
Beta Was this translation helpful? Give feedback.
-
Seems like there would have to be new logic. Just for the sake of thinking out loud:
Is this all possible? Am I missing something? I'm not sure if (4) is possible because the dispatcher is non-polling. Without polling, I suppose a callback mechanism would be needed, and from #42 (comment) it sounds like a callback mechanism does not exist at the NNG level. |
Beta Was this translation helpful? Give feedback.
-
It's just a question of efficiency. You can always do something like send a received ack when dispatcher receives the result from server and have server wait for that. Just sending messages will be more efficient than establishing a new pipe in [4]. However this will mean having a 'receive task' state at server, followed by a 'receive ack' state. Probably robust, but likely 'something they did 30 years ago'... And I think this will mean doing this for every task, I don't think there's a good way for server to signal 'I want to exit, send an ack next time'. |
Beta Was this translation helpful? Give feedback.
-
Eliminated the 'exitlinger' in ephemeral (dot) daemons through synchronisation techniques in 35e94ec. The message receive completion callback automatically closes the pipe - triggering a condition variable on the daemon to wake and proceed to exit. Need to consider how best to implement something similar for the other cases. |
Beta Was this translation helpful? Give feedback.
-
With the latest commit 51ec228 (requiring dev nanonext) - the exitlinger is likely to be eliminated after the next round of releases. Whilst this commit doesn't get us there yet, it seems solvable. Notably this commit does solve another important issue - the 'backlogged workers' problem of always having to re-launch these daemons (i.e. wlandau/crew#79 (comment)). @wlandau this should hopefully also allow the |
Beta Was this translation helpful? Give feedback.
-
This is solved as of 7d6f4c2. In all cases there is a fallback to the global timeout of 5s to ensure exiting processes never hang. |
Beta Was this translation helpful? Give feedback.
-
As servers have the option to time out or task-out after a set number of tasks, it would be ideal to exit the process immediately thereafter - however, at present, this is only possible after an 'exitlinger' period, which by default is set to 1s. This should be sufficient for sending objects of ~ 1GB in size.
What is currently not possible is for exit to be conditional upon the send being completed.
This is, I believe, due to:
It would be great if a solution can be found.
Beta Was this translation helpful? Give feedback.
All reactions