You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, it seems that when a runner pod loses its connection to the controller, it does not retry and just stops.
Would it be possible to modify this so that it retries when the connection is lost?
For example, if the controller is suddenly evicted due to a node drain, all runners lose their connection and stop.
In my case, the termination wait time is quite long, so all runners remain in a terminating state for over 20 minutes.
Instead of this, wouldn’t it be a better implementation to prevent the node from being drained while the runner is running and until the controller finishes communicating with the runner?
Am I misunderstanding something?
I’d like to hear your opinion.
Let me know if you need further refinements!
The text was updated successfully, but these errors were encountered:
Currently, it seems that when a runner pod loses its connection to the controller, it does not retry and just stops.
Would it be possible to modify this so that it retries when the connection is lost?
For example, if the controller is suddenly evicted due to a node drain, all runners lose their connection and stop.
In my case, the termination wait time is quite long, so all runners remain in a terminating state for over 20 minutes.
Instead of this, wouldn’t it be a better implementation to prevent the node from being drained while the runner is running and until the controller finishes communicating with the runner?
Am I misunderstanding something?
I’d like to hear your opinion.
Let me know if you need further refinements!
The text was updated successfully, but these errors were encountered: