Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Only Reset Matter and Matter related nvs data (CON-1529) #1259

Open
bilalahmaddev opened this issue Jan 25, 2025 · 22 comments
Open

How to Only Reset Matter and Matter related nvs data (CON-1529) #1259

bilalahmaddev opened this issue Jan 25, 2025 · 22 comments

Comments

@bilalahmaddev
Copy link

Is there any API to reset Matter only and Matter related nvs Data?

@github-actions github-actions bot changed the title How to Only Reset Matter and Matter related nvs data How to Only Reset Matter and Matter related nvs data (CON-1529) Jan 25, 2025
@shubhamdp
Copy link
Contributor

Can you elaborate more on "only reset matter and matter related nvs data"? Do you mean to only erase the matter specific data? Or are you expecting anything more?

esp_matter::factory_reset(), only erases the Matter specific data, but this resets the device (i.e. reboot).

@bilalahmaddev
Copy link
Author

@shubhamdp I am using same API when I flash code first time and run TC-RR-1.1 it passes the test and next time when I perform factory reset using this API esp_matter::factory_reset(), test case fails on timeout error on step 14.

@shubhamdp
Copy link
Contributor

Can you please try this, replace the ConfigurationMgr().InitiateFactoryReset(); with chip::Server::GetInScheduleFactoryReset() here https://github.com/espressif/esp-matter/blob/main/components/esp_matter/esp_matter_core.cpp#L894.

Below patch does the same...

diff --git a/components/esp_matter/esp_matter_core.cpp b/components/esp_matter/esp_matter_core.cpp
index 0a578a1..f426401 100644
--- a/components/esp_matter/esp_matter_core.cpp
+++ b/components/esp_matter/esp_matter_core.cpp
@@ -891,7 +891,7 @@ esp_err_t factory_reset()
     }

     /* Submodule factory reset. This also restarts after completion. */
-    ConfigurationMgr().InitiateFactoryReset();
+    chip::Server::GetInstance().ScheduleFactoryReset();
     return err;
 }

@bilalahmaddev
Copy link
Author

@shubhamdp Thanks I will test tomorrow and let you know.

@shubhamdp
Copy link
Contributor

@bilalahmaddev did you get a chance to try this out?

@dhrishi
Copy link
Collaborator

dhrishi commented Mar 3, 2025

@bilalahmaddev Please close the issue if resolved

@bilalahmaddev
Copy link
Author

@shubhamdp I tried this but it did not work:

diff --git a/components/esp_matter/esp_matter_core.cpp b/components/esp_matter/esp_matter_core.cpp
index 0a578a1..f426401 100644
--- a/components/esp_matter/esp_matter_core.cpp
+++ b/components/esp_matter/esp_matter_core.cpp
@@ -891,7 +891,7 @@ esp_err_t factory_reset()
}

 /* Submodule factory reset. This also restarts after completion. */
  • ConfigurationMgr().InitiateFactoryReset();
  • chip::Server::GetInstance().ScheduleFactoryReset();
    return err;
    }

@bilalahmaddev
Copy link
Author

@shubhamdp can you please help me fix this. When I am using same API when I flash code first time and run TC-RR-1.1 it passes the test and next time when I perform factory reset using this API, test case fails on timeout error on step 14.

But I again if I perform erase_flash and flash code again TC-RR-1.1 passes

@shubhamdp
Copy link
Contributor

@bilalahmaddev can you help me with the esp-matter and esp-idf commits please. Please share the TH and DUT logs.

@bilalahmaddev
Copy link
Author

bilalahmaddev commented Mar 17, 2025

@shubhamdp

esp-idf commit: c9763f62dd00c887a1a8fafe388db868a7e44069
esp-matter commit: 427b40d

TH logs:

TH_logs_TC-RR-1.1.txt

@shubhamdp
Copy link
Contributor

shubhamdp commented Mar 17, 2025

@bilalahmaddev can you try bumping the timeout? I gave a shot with 10000 and it is working every time

python3 TC_RR_1_1.py -m ble-wifi -p 20202021 -d 3840 \
                     --wifi-ssid (SSID) --wifi-passphrase (PSK) \
                     --int-arg use_pase_only:0 --storage-path /tmp/a \
                     -c /tmp/a --timeout 10000

@shubhamdp
Copy link
Contributor

@bilalahmaddev NVS state on the fresh flash and after calling esp_matter::factory_reset() is exactly the same. The difference is that, wifi keeps few more keys and set the values to all FF. For Matter, there is only one change and it is as per specification, i.e. unique-id changes.

@bilalahmaddev
Copy link
Author

bilalahmaddev commented Mar 17, 2025

@shubhamdp I tried with --timeout 10000 but still fails:

@shubhamdp
Copy link
Contributor

I think you should analyze the NVS content on fresh flashing as well as after factory reset.

You can read the nvs using esptool.py, please make sure you check the nvs address and size.

esptool.py -p /dev/cu.usbserial-110 read_flash 0x10000 0xc000 /tmp/nvs-after-factory-reset.bin

I usually use this: https://github.com/AFontaine79/Espressif-NVS-Analyzer/blob/main/analyze_nvs.py script for analysis.

analyze_nvs.py /tmp/nvs-after-factory-reset.bin > /tmp/nvs-after-factory-reset.txt

See if you find any differences.

Dumb question though: Did you update the submodules, we shipped the groups related fix in that one recently?

git submodule update connectedhomeip/connectedhomeip
cd connectedhomeip/connectedhomeip
./scripts/checkout_submodules.py --platform esp32 (host platform) --shallow

@bilalahmaddev
Copy link
Author

@shubhamdp Ok let me try.

I am on this commit for connectedhomeip: 593d5c6f63a62e017e4ced43183049f2805a9db8

@bilalahmaddev
Copy link
Author

@shubhamdp We have 14 endpoint on this device and our last device has 12 endpoints, it passed this test after factory reset every time and it is also passed ATL certification tests. What do you think about size of nvs for 14 endpoints?

`# Note: Firmware partition offset needs to be 64K aligned, initial 36K (9 sectors) are reserved for bootloader and partition table

Name, Type, SubType, Offset, Size, Flags

esp_secure_cert, 0x3F, , 0xD000, 0x2000, encrypted
nvs, data, nvs, 0x10000, 0x80000,
nvs_keys, data, nvs_keys,0x90000, 0xC000, encrypted
otadata, data, ota, 0x9C000, 0x2000,
phy_init, data, phy, 0x9E000, 0x1000,
ota_0, app, ota_0, 0x100000, 0x4F0000,
ota_1, app, ota_1, 0x5F0000, 0x4F0000,
fctry, data, nvs, 0xAE0000, 0x6000,
temp_nvs, data, nvs, 0xAE6000, 0x3000,`

@shubhamdp
Copy link
Contributor

Our recommendation is to have 48K for 2 endpoints. I think we can still get the TC-RR-1-1 for 8 endpoints with this.

You have NVS of 512K, so this is good enough, I guess. I don't have a number for per endpoint overhead for the NVS (Will need to get this number).

I can suggest one more thing, you can write your own factory reset which erases the complete NVS(assuming you don't have any data that needs to be persisted across factory reset, and that data is stored in "fctry"). This approach is being used in our esp-rainmaker framework. https://github.com/espressif/esp-rainmaker-common/blob/6398f401f2d4333cf0ed712d51f8fce3830cadf6/src/utils.c#L76

I suspect two problems here:

  1. write could be slow hence it is being timed out.
  2. may be fragmentation, NVS needs at least 1 page(4K) free almost every time so that it can erase the page and make it ready for new writes. (did increasing the nvs sized worked?)

@bilalahmaddev
Copy link
Author

bilalahmaddev commented Mar 17, 2025

@shubhamdp I tried nvs_flash_erase(); but I think it is also erasing factory data and device become uncommissionable.

Image

increasing nvs works sometime but not every time.

For testing, I have increased the nvs size to 700KB and now TC-RR-1.1 passes first two times and start failing after that. That means esp_matter::factory_reset() is not clearing "nvs" -> esp_matter_kvs and it is creating fragmentation.

with analyze_nvs.py, I see differences in Namespace nvs.net80211

Image
also in these

Image

@bilalahmaddev
Copy link
Author

@shubhamdp I am able to resolve it by specifically clearing all nvs name spaces before calling esp_matter::factory_reset() even without increasing nvs size from 512KB. I tested 3 times.

@shubhamdp
Copy link
Contributor

@bilalahmaddev Thanks for the update, can you please tell us which nvs namespace did you erase?

Also, a suggestion, your factory data should not be stored in "nvs" partition but in "fctry". So that you can just erase the complete "nvs" partition on factory-reset.

@bilalahmaddev
Copy link
Author

bilalahmaddev commented Mar 18, 2025

@shubhamdp I have erased these: esp_matter_kvs, storage(our application defined), nvs.net80211, chip-config and CHIP_KVS.

To write factory data to the factory partition (fctry), we should use the address specified in your partition table, which in our case is 0xAE0000. But how application we automatically find if data is in factry partition?

We need to set this right?
CONFIG_CHIP_FACTORY_NAMESPACE_PARTITION_LABEL="fctry"

@shubhamdp
Copy link
Contributor

To this list [esp_matter_kvs, storage, nvs.net80211, chip-config, CHIP_KVS], you should add chip-counters as well.

Ideally what you should do is erase your application data first (storage namespace) and then call esp_matter::factory_reset();. I'm not sure if nvs.net80211 is creating problem.

Yes, you will need to write to fctry labeled partition at address 0xAE0000 and set the CONFIG_CHIP_FACTORY_NAMESPACE_PARTITION_LABEL="fctry".

More info can be found at below links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants