Delay marking a node as unavailable #568
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After our recent change with the proactive node detection using our own mdns discovery we also changed how/when we mark a node as unavailable/offline if its not responding. we lowered the treshold in favor of re-detecting it over mdns.
We however get multiple reports that people now see their nodes go unavailable a lot more often.
Now we can go argue that they should fix their networks/devices so they do not miss the subscription intervals but real world is that dropouts may happen and bad devices do exist ;-)
I have investigated other platforms like Apple Home and Google home and they're all a huge bit more forgiving than us, basically "hiding" the node unavailability for a while by debouncing it. This PR extends our tresholds as well to follow that pattern but it still logs these events so people are still aware that something is going on in their network (and hopefully fix it).
The outcome is that devices won't be marked unavailable in HA as fast as before, basically reproducing the behavior of other platforms.