-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ATTENTION: Node x (yz) did not complete setup in 30 minutes. - Nodes stuck in setup #807
Comments
The behavior stems from the auto-resubscribe flag. The code as it is today does not make a difference between the first read attempt (on subscription establishment) or subsequent re-subscriptions. In both cases Now this is problematic since we do not have a handle for the subscription at that point. So from Python side, the async read call is just taking forever. In the SDK, the We could use a try:
sub: Attribute.SubscriptionTransaction = (
await asyncio.wait_for(self._chip_device_controller.read_attribute(
node_id,
[()],
events=[("*", 1)],
return_cluster_objects=False,
report_interval=(interval_floor, interval_ceiling),
auto_resubscribe=True,
)
, 120)
)
except asyncio.TimeoutError:
node_logger.error("Subscription attempt timed out!!!")
raise This would allow us to continue and try again. However, the subscription will linger around. This is because the Sidenote: It seems that such a timeout bubbles up into the task, leading to unhandled task exception shown once exiting the server (this is modified code, but maybe it would be better to catch all exceptions in
|
With PR project-chip/connectedhomeip#34370 the behavior will change: If the initial subscription attempt fails, then the |
This fixes Subscription issues. Mainly it avoids being stuck in subscription setup if there is a communication issue with the device. Fixes: #807
With #623 an error got introduced if our node setup takes longer than expected.
What seems to happen is that the SDK is seemingly stuck during the initial read/subscription setup.
In a case at hand (Meross Smart Plug, after power cycling it because it fell of the WiFi network), with SDK Progress logging enabled, the following sequence of events happened:
Controller was running, the device was controllable before, but the Meross Smart Plug seemingly crashed/lost connection to the WiFi network.
Restart of the Smart Plug
The device got rediscovered
The reason for that is simply that the CASE session was no longer valid on Meross Smart Plug side. This is expected, since the Meross smart plug power cycled and does not support persistent CASE subscriptions.
Now this is expected to work. However, for whatever reason the device seem to have crashed right away. Maybe because we (and other controllers too, I have it connected to multiple Controllers) tried to communicate with an invalid CASE session, but this is speculation. The Meross devices are Matter 1.0 devices and are known to be a bit unstable. Anyhow, what the SDK now tries to do is to reestablish CASE session continuously.
From there the cycle continuous forever, the Controller attempts to find the device for 45s (in this case after a retry delay of 8.9s):
The logs does not show the
Previous subscription failed with Error: %s, re-subscribing in %s ms...
logs, this meansresubscription_attempted
does not get called. The reason is thatSetResubscriptionAttemptedCallback()
has not been called at this point yet. In other words, because the very initial read already failed, the subscription hasn't been completely setup at this point.Eventually, after I restarted the device again, the setup completed (no Controller restart was needed):
The text was updated successfully, but these errors were encountered: