Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groups don't survive reboot (CON-863) #713

Open
jonsmirl opened this issue Nov 2, 2023 · 7 comments
Open

Groups don't survive reboot (CON-863) #713

jonsmirl opened this issue Nov 2, 2023 · 7 comments

Comments

@jonsmirl
Copy link
Contributor

jonsmirl commented Nov 2, 2023

Maybe an IPv6 issue?

  1. Make a Matter group with a Device A as a light switch

  2. Bind the switch to the group

  3. Add Device B light to group -- both devices are on same wifi router.

  4. Verify that Device A can send commands to device B. Like toggle the light on/off.

  5. this works for me as expected,

  6. Reboot device A and don't reboot Device B

  7. Try using the grouped switch on Device A
    For me it doesn't work.

  8. Reboot the wifi router. I have Amazon Eero router.

  9. Now switch will work.

Is something wrong with how the IPv6 multicast was constructed? The CHIP log on device A says it is sending the request, but device B does not receive it.

I will need to build a promiscuous wifi sniffer to see if Device A is sending the packets. Right now I can't tell if Device A is not sending or if the router is not retransmitting. You guys likely have the right equipment around to check this.

Update -- MDNS works after reboot. Note that Matter is using IPv6 groups. But is MDNS using IPv6 or IPv4?

@github-actions github-actions bot changed the title Groups don't survive reboot Groups don't survive reboot (CON-863) Nov 2, 2023
@wqx6
Copy link
Contributor

wqx6 commented Nov 6, 2023

I cannot reproduce this error. Could you please share the commands you sent to the switch and the light and check if there is the log of chip[IN]: Joining Multicast Group with address UDP after you reboot the switch?

@jonsmirl
Copy link
Contributor Author

jonsmirl commented Nov 6, 2023

My current opinion is that it is the router causing this. IPv6 groups did not work at all on my old router. Now I have bought a new Amazon Eero one. If we assume it is the router, is there anything that can be done from the ESP side to force the router to rebuild the group? I am still testing to try and understand what is happening.

It would be very useful if CHIP people put together some Android/IOS apps which could be used to test IPv6 multicast groups. Then you could use two or three phones to test routers. I suspect there are a lot of home routers with various IPv6 bugs.

@jonsmirl
Copy link
Contributor Author

jonsmirl commented Nov 6, 2023

Can you try reflashing when you reboot? I have not changed the fabric ID and group number. I did not erase the partition with the commissioning data in it.

What I think is going on that when the node reboots and hooks to the multicast group, it is not hooking to the same group as before the reboot. Instead it is making a new one which is not connected to the previous group. I tried using my wifi sniffer to look at it, but everything is encrypted so I don't know what is going on.

I have six Eeros connected over Ethernet backbone. So the multi-cast packets have to propagate through that network for things to work right.

@jonsmirl
Copy link
Contributor Author

jonsmirl commented Nov 6, 2023

What are these new errors, I started getting them after I updated CHIP.

I (359340729) chip[EVL]: Dropped 1 event from buffer with priority 1 and event number  0x0000000000070049 due to overflow: event priority_level: 1
I (359340989) chip[EVL]: Dropped 1 event from buffer with priority 1 and event number  0x000000000007004A due to overflow: event priority_level: 1
I (359341149) chip[EVL]: Dropped 1 event from buffer with priority 1 and event number  0x000000000007004B due to overflow: event priority_level: 1
I (359337479) chip[EM]: <<< [E:36749i S:34997 M:121391162] (S) Msg TX to 2:AB0DE00000000000 [4507] [UDP:192.168.1.72%st1:48013] --- Type 0001:05 (IM:ReportData)
I (359337619) chip[EM]: >>> [E:36749i S:34997 M:160426002 (Ack:121391162)] (S) Msg RX from 2:AB0DE00000000000 [4507] --- Type 0001:01 (IM:StatusResponse)
I (359337619) chip[IM]: Received status response, status is 0x00
I (359337619) chip[EM]: <<< [E:36749i S:34997 M:121391163 (Ack:160426002)] (S) Msg TX to 2:AB0DE00000000000 [4507] [UDP:192.168.1.72%st1:48013] --- Type 0000:10 (SecureChannel:StandaloneAck)
I (359340719) h: active_key 4
I (359340719) ep: Initial button pressed 1
I (359340719) esp_matter_attribute: ********** W : Endpoint 0x0005's Cluster 0x0000003B's Attribute 0x00000001 is 1 **********
I (359340719) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 EP 5 CL 59 AT 1
I (359340719) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 1008009233 4
I (359340719) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 17 4
I (359340719) ep: app_driver_attribute_update EP 5 CL 59 AT 1 PTR 0x3c159138
I (359340729) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 EP 5 CL 59 AT 1
I (359340729) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 btype 1008009233 4
I (359340729) chip[ZCL]: SwitchServer: OnInitialPress
I (359340729) chip[EVL]: Dropped 1 event from buffer with priority 1 and event number  0x0000000000070049 due to overflow: event priority_level: 1
I (359340739) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x0000003B's Attribute 0x00000001 is 1 **********
I (359340739) chip[EM]: <<< [E:36750i S:34997 M:121391164] (S) Msg TX to 2:AB0DE00000000000 [4507] [UDP:192.168.1.72%st1:48013] --- Type 0001:05 (IM:ReportData)
I (359340839) chip[EM]: >>> [E:36750i S:34997 M:160426003 (Ack:121391164)] (S) Msg RX from 2:AB0DE00000000000 [4507] --- Type 0001:01 (IM:StatusResponse)
I (359340839) chip[IM]: Received status response, status is 0x00
I (359340839) chip[EM]: <<< [E:36750i S:34997 M:121391165 (Ack:160426003)] (S) Msg TX to 2:AB0DE00000000000 [4507] [UDP:192.168.1.72%st1:48013] --- Type 0000:10 (SecureChannel:StandaloneAck)
I (359340959) h: active_key 4
I (359340959) ep: Short button release 1
I (359340959) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x00000006's Attribute 0x00000000 is 0 **********
I (359340959) chip[ZCL]: Toggle ep5 on/off from state 0 to 1
I (359340959) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x00000006's Attribute 0x0000FFFC is 1 **********
I (359340959) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x00000006's Attribute 0x00004001 is 0 **********
I (359340959) chip[ZCL]: On Command - OffWaitTime :  0
I (359340959) esp_matter_attribute: ********** W : Endpoint 0x0005's Cluster 0x00000006's Attribute 0x00004002 is 0 **********
I (359340959) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 EP 5 CL 6 AT 16386
I (359340969) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 1008009233 4
I (359340969) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 17 4
I (359340969) ep: app_driver_attribute_update EP 5 CL 6 AT 16386 PTR 0x3c159138
I (359340969) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 EP 5 CL 6 AT 16386
I (359340969) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 btype 1008009233 4
I (359340969) chip[ZCL]: On/Toggle Command - Stop Timer
I (359340969) esp_matter_attribute: ********** W : Endpoint 0x0005's Cluster 0x00000006's Attribute 0x00004000 is 1 **********
I (359340969) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 EP 5 CL 6 AT 16384
I (359340969) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 1008009233 4
I (359340969) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 17 4
I (359340969) ep: app_driver_attribute_update EP 5 CL 6 AT 16384 PTR 0x3c159138
I (359340969) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 EP 5 CL 6 AT 16384
I (359340969) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 btype 1008009233 4
I (359340969) esp_matter_attribute: ********** W : Endpoint 0x0005's Cluster 0x00000006's Attribute 0x00000000 is 1 **********
I (359340969) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 EP 5 CL 6 AT 0
I (359340969) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 1008009233 4
I (359340969) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 17 4
I (359340969) ep: app_driver_attribute_update EP 5 CL 6 AT 0 PTR 0x3c159138
I (359340969) ep: Power 1 RGB 255 204 165
I (359340969) ep: Update button position 5 power 1 R 255 G 204 B 165
I (359340969) ep: Button pos 5 LED 2:  63 41 51, rgb 255 204 165
I (359340979) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 EP 5 CL 6 AT 0
I (359340979) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 btype 1008009233 4
I (359340979) esp_matter_attribute: ********** W : Endpoint 0x0005's Cluster 0x00000005's Attribute 0x00000003 is 0 **********
I (359340979) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 EP 5 CL 5 AT 3
I (359340979) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 1008009233 4
I (359340979) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 17 4
I (359340979) ep: app_driver_attribute_update EP 5 CL 5 AT 3 PTR 0x3c159138
I (359340979) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 EP 5 CL 5 AT 3
I (359340979) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 btype 1008009233 4
I (359340979) esp_matter_attribute: ********** W : Endpoint 0x0005's Cluster 0x0000003B's Attribute 0x00000001 is 0 **********
I (359340979) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 EP 5 CL 59 AT 1
I (359340979) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 1008009233 4
I (359340979) ep: app_attribute_update_cb type 0 priv 0x3fcb5d94 btype 17 4
I (359340979) ep: app_driver_attribute_update EP 5 CL 59 AT 1 PTR 0x3c159138
I (359340979) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 EP 5 CL 59 AT 1
I (359340979) ep: app_attribute_update_cb type 1 priv 0x3fcb5d94 btype 1008009233 4
I (359340979) chip[ZCL]: SwitchServer: OnShortRelease
I (359340989) chip[EVL]: Dropped 1 event from buffer with priority 1 and event number  0x000000000007004A due to overflow: event priority_level: 1
I (359340989) ep: group_cb
I (359340989) chip[-]: Group CB cluster 6 command 1
I (359340999) chip[EM]: <<< [E:36751i S:549 M:133068707] (G) Msg TX to 2:FFFFFFFFFFFF0225 [4507] [UDP:[FF35:40:FDFA:B000::1D00:225]:5540] --- Type 0001:08 (IM:InvokeCommandRequest)
I (359340999) ep: Command On
I (359341139) h: active_key 4
I (359341139) ep: Multipress Complete 1
I (359341149) chip[ZCL]: SwitchServer: OnMultiPressComplete
I (359341149) chip[EVL]: Dropped 1 event from buffer with priority 1 and event number  0x000000000007004B due to overflow: event priority_level: 1
I (359341749) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x00000005's Attribute 0x00000006 is 0 **********
I (359341749) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x00000005's Attribute 0x00000003 is 0 **********
I (359341749) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x00000006's Attribute 0x00000000 is 1 **********
I (359341749) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x00000006's Attribute 0x00004000 is 1 **********
I (359341749) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x00000006's Attribute 0x00004002 is 0 **********
I (359341759) esp_matter_attribute: ********** R : Endpoint 0x0005's Cluster 0x0000003B's Attribute 0x00000001 is 0 **********
I (359341769) chip[EM]: <<< [E:36752i S:34997 M:121391166] (S) Msg TX to 2:AB0DE00000000000 [4507] [UDP:192.168.1.72%st1:48013] --- Type 0001:05 (IM:ReportData)
I (359341839) chip[EM]: >>> [E:36752i S:34997 M:160426004 (Ack:121391166)] (S) Msg RX from 2:AB0DE00000000000 [4507] --- Type 0001:01 (IM:StatusResponse)
I (359341839) chip[IM]: Received status response, status is 0x00
I (359341849) chip[EM]: <<< [E:36752i S:34997 M:121391167 (Ack:160426004)] (S) Msg TX to 2:AB0DE00000000000 [4507] [UDP:192.168.1.72%st1:48013] --- Type 0000:10 (SecureChannel:StandaloneAck)
I (359341869) app_main: Dump #17967 Free internal 108592 biggest block 43008 total 108592
I (359350589) ROUTE_HOOK: Ignore invalid ICMP packet
I (359350589) ROUTE_HOOK: Received RIO
I (359350589) ROUTE_HOOK: prefix FD87:31C3:3CE

here's my code calling the APIs

       // short press release
        ESP_LOGI(TAG, "Short button release %d", position);

        app::CommandHandler commandObj(nullptr);
        // Press moves Position from 0 (idle) to 1 (press)
        lock::chip_stack_lock(portMAX_DELAY);
        if (position == UP)
        {
            OnOff::Commands::On::DecodableType commandData;
            app::ConcreteCommandPath commandPath = {epLightID, OnOff::Id, OnOff::Commands::On::Id};
            emberAfOnOffClusterOnCallback(&commandObj, commandPath, commandData);
        }
        else
        {
            OnOff::Commands::Off::DecodableType commandData;
            app::ConcreteCommandPath commandPath = {epLightID, OnOff::Id, OnOff::Commands::Off::Id};
            emberAfOnOffClusterOffCallback(&commandObj, commandPath, commandData);
        }
        chip::app::Clusters::Switch::Attributes::CurrentPosition::Set(epLightID, 0);
        // LongPress event takes newPosition as event data
        switch_cluster::event::send_short_release(epLightID, position);
        lock::chip_stack_unlock();

        if (position == UP)
        {
            OnOff::Commands::On::Type commandDataT;

            client::command_handle_t cmd_handle;
            uint16_t local_endpoint_id = epLightID;
            cmd_handle.cluster_id = OnOff::Id;
            cmd_handle.command_id = OnOff::Commands::On::Id;
            cmd_handle.is_group = true;
            cmd_handle.command_data = &commandDataT;

            lock::status_t lock_status = lock::chip_stack_lock(portMAX_DELAY);
            if (lock_status == lock::FAILED)
            {
                ESP_LOGE(TAG, "Could not lock chip stack");
            }
            client::cluster_update(local_endpoint_id, &cmd_handle);
            if (lock_status == lock::SUCCESS)
            {
                lock::chip_stack_unlock();
            }
            ESP_LOGI(TAG, "Command On");
        }
        else
        {
            OnOff::Commands::Off::Type commandDataT;

            client::command_handle_t cmd_handle;
            uint16_t local_endpoint_id = epLightID;
            cmd_handle.cluster_id = OnOff::Id;
            cmd_handle.command_id = OnOff::Commands::Off::Id;
            cmd_handle.is_group = true;
            cmd_handle.command_data = &commandDataT;

            lock::status_t lock_status = lock::chip_stack_lock(portMAX_DELAY);
            if (lock_status == lock::FAILED)
            {
                ESP_LOGE(TAG, "Could not lock chip stack");
            }
            client::cluster_update(local_endpoint_id, &cmd_handle);
            if (lock_status == lock::SUCCESS)
            {
                lock::chip_stack_unlock();
            }
            ESP_LOGI(TAG, "Command Off");
        }
    }

@jonsmirl
Copy link
Contributor Author

jonsmirl commented Nov 6, 2023

I increased the event buffer sizes and the Dropped error went away. Hopefully those buffers are in PSRAM.

@jonsmirl
Copy link
Contributor Author

Need to sort out #721 and #714 before I can test more on this one.

@dhrishi
Copy link
Collaborator

dhrishi commented Sep 26, 2024

@jonsmirl Did you get a chance to verify this? The above two issues you mentioned are closed. Please close the issue if resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants