Implement zeroconf for operational node discovery #531

marcelveldt · 2024-02-07T08:41:59Z

Implemented zeroconf to do our own mdns discovery of operational (and later commissionable) Matter nodes.
This solves a number of issues, most importantly that we do no longer have to rely on the (bad) polling behavior of the SDK if a node goes unavailable. Instead we can detect it coming back online through mdns discovery proactively.

Initialize nodes on discovery over mdns
Do no longer try to resolve all nodes at startup (as only the live ones will be discovered over mdns now).
Mark node offline if zeroconf reports it as offline.
Prepare listening for commissionable nodes, to be picked up in a follow-up PR.
Some small polishes such as deny commands to an unavailable node

Fixes:

Continuous error logging of the SDK trying to resolve offline nodes.
If a Matter device is removed from power it took hours before it was rediscovered when power was restored, this is now seconds.

agners · 2024-02-07T12:08:30Z

on the (bad) polling behavior of the SDK if a node goes unavailable

What polling behavior are you referring to?

The rescheduled interview when a node stopped being available (not resolving) was on our end no?

agners · 2024-02-07T12:18:02Z

matter_server/server/device_controller.py

+        # Remove and cancel any existing interview/subscription reschedule timer
+        if existing := self._sub_retry_timer.pop(node_id, None):
+            existing.cancel()
+        # shutdown existing subscriptions
+        if sub := self._subscriptions.pop(node_id, None):
+            await self._call_sdk(sub.Shutdown)


This code is also used when deleting the node, I think we should extract it in a common function.

agners · 2024-02-07T12:18:40Z

matter_server/server/device_controller.py

-                self.server.signal_event(EventType.NODE_UPDATED, node)
+                # NOTE: if the node is (re)discovered by mdns, that callback will
+                # take care of resubscribing to the node
+                asyncio.create_task(self._node_offline(node_id))


I am a bit worried that we get false positives here. But let's see.

Btw, you should store the task somewhere, see also https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task and https://bugs.python.org/issue44665.

Important: Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done. For reliable “fire-and-forget” background tasks, gather them in a collection:

marcelveldt added 2 commits February 7, 2024 09:28

Implement pyzeroconf for operational node discovery

767c782

typo

d731dea

marcelveldt added the new-feature New feature or request label Feb 7, 2024

marcelveldt added 3 commits February 7, 2024 09:53

reststructure offline node code

59b9531

typo

c68a878

some touches

f10037f

marcelveldt requested a review from MartinHjelmare February 7, 2024 10:33

MartinHjelmare approved these changes Feb 7, 2024

View reviewed changes

marcelveldt merged commit efeccd8 into main Feb 7, 2024
4 checks passed

marcelveldt deleted the zeroconf branch February 7, 2024 11:32

agners reviewed Feb 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement zeroconf for operational node discovery #531

Implement zeroconf for operational node discovery #531

marcelveldt commented Feb 7, 2024 •

edited

Loading

agners commented Feb 7, 2024

agners Feb 7, 2024

agners Feb 7, 2024

Implement zeroconf for operational node discovery #531

Implement zeroconf for operational node discovery #531

Conversation

marcelveldt commented Feb 7, 2024 • edited Loading

agners commented Feb 7, 2024

agners Feb 7, 2024

Choose a reason for hiding this comment

agners Feb 7, 2024

Choose a reason for hiding this comment

marcelveldt commented Feb 7, 2024 •

edited

Loading