Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement zeroconf for operational node discovery #531

Merged
merged 5 commits into from
Feb 7, 2024
Merged

Conversation

marcelveldt
Copy link
Collaborator

@marcelveldt marcelveldt commented Feb 7, 2024

Implemented zeroconf to do our own mdns discovery of operational (and later commissionable) Matter nodes.
This solves a number of issues, most importantly that we do no longer have to rely on the (bad) polling behavior of the SDK if a node goes unavailable. Instead we can detect it coming back online through mdns discovery proactively.

  • Initialize nodes on discovery over mdns
  • Do no longer try to resolve all nodes at startup (as only the live ones will be discovered over mdns now).
  • Mark node offline if zeroconf reports it as offline.
  • Prepare listening for commissionable nodes, to be picked up in a follow-up PR.
  • Some small polishes such as deny commands to an unavailable node

Fixes:

  • Continuous error logging of the SDK trying to resolve offline nodes.
  • If a Matter device is removed from power it took hours before it was rediscovered when power was restored, this is now seconds.

@marcelveldt marcelveldt added the new-feature New feature or request label Feb 7, 2024
@marcelveldt marcelveldt merged commit efeccd8 into main Feb 7, 2024
4 checks passed
@marcelveldt marcelveldt deleted the zeroconf branch February 7, 2024 11:32
@agners
Copy link
Collaborator

agners commented Feb 7, 2024

on the (bad) polling behavior of the SDK if a node goes unavailable

What polling behavior are you referring to?

The rescheduled interview when a node stopped being available (not resolving) was on our end no?

Comment on lines +1253 to +1258
# Remove and cancel any existing interview/subscription reschedule timer
if existing := self._sub_retry_timer.pop(node_id, None):
existing.cancel()
# shutdown existing subscriptions
if sub := self._subscriptions.pop(node_id, None):
await self._call_sdk(sub.Shutdown)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is also used when deleting the node, I think we should extract it in a common function.

self.server.signal_event(EventType.NODE_UPDATED, node)
# NOTE: if the node is (re)discovered by mdns, that callback will
# take care of resubscribing to the node
asyncio.create_task(self._node_offline(node_id))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried that we get false positives here. But let's see.

Btw, you should store the task somewhere, see also https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task and https://bugs.python.org/issue44665.

Important: Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done. For reliable “fire-and-forget” background tasks, gather them in a collection:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants