Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug: coredump: add internal flash backend based on nrfx #21418

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Damian-Nordic
Copy link
Contributor

Add Zephyr core dump backend that saves a core dump to the internal flash or RRAM partition named "coredump_partition".

This backend is an alternative to BACKEND_FLASH_PARTITION provided by Zephyr but it bypasses Zephyr flash device layer and uses nrfx directly, which offers the following benefits:

  1. Bypasses synchronization primitives used by Zephyr flash or RRAM drivers. Currently, Zephyr flash drivers cannot be used to write to flash from a fault handler because of this.
  2. Works with Partition Manager.
  3. Minimizes the dependencies needed to successfully write a core dump, which is important as the core dump often needs to be written when the system is in the corrupted state.

Only flash and RRAM are supported for now (no MRAM support).

@Damian-Nordic Damian-Nordic requested review from a team and nordic-krch as code owners April 4, 2025 11:44
@github-actions github-actions bot added the doc-required PR must not be merged without tech writer approval. label Apr 4, 2025
@NordicBuilder
Copy link
Contributor

NordicBuilder commented Apr 4, 2025

CI Information

To view the history of this post, clich the 'edited' button above
Build number: 5

Inputs:

Sources:

sdk-nrf: PR head: 346ee03e6a264430ba88d3522a67cda38eb2a9b8

more details

sdk-nrf:

PR head: 346ee03e6a264430ba88d3522a67cda38eb2a9b8
merge base: 051cce44210010252b787bfd13c78f8a64d9342b
target head (main): 051cce44210010252b787bfd13c78f8a64d9342b
Diff

Github labels

Enabled Name Description
ci-disabled Disable the ci execution
ci-all-test Run all of ci, no test spec filtering will be done
ci-force-downstream Force execution of downstream even if twister fails
ci-run-twister Force run twister
ci-run-zephyr-twister Force run zephyr twister
List of changed files detected by CI (7)
CODEOWNERS
doc
│  ├── nrf
│  │  ├── releases_and_maturity
│  │  │  ├── releases
│  │  │  │  │ release-notes-changelog.rst
subsys
│  ├── debug
│  │  ├── CMakeLists.txt
│  │  ├── Kconfig
│  │  ├── coredump
│  │  │  ├── CMakeLists.txt
│  │  │  ├── Kconfig
│  │  │  │ coredump_backend_nrf_flash_partition.c

Outputs:

Toolchain

Version: 7cbc0036f4
Build docker image: docker-dtr.nordicsemi.no/sw-production/ncs-build:7cbc0036f4_8bf7ca4353

Test Spec & Results: ✅ Success; ❌ Failure; 🟠 Queued; 🟡 Progress; ◻️ Skipped; ⚠️ Quarantine

  • ◻️ Toolchain - Skipped: existing toolchain is used
  • ✅ Build twister
    • sdk-nrf test count: 263
  • ✅ Integration tests
Disabled integration tests
    • desktop52_verification
    • doc-internal
    • test_ble_nrf_config
    • test-fw-nrfconnect-apps
    • test-fw-nrfconnect-ble_mesh
    • test-fw-nrfconnect-ble_samples
    • test-fw-nrfconnect-boot
    • test-fw-nrfconnect-chip
    • test-fw-nrfconnect-fem
    • test-fw-nrfconnect-nfc
    • test-fw-nrfconnect-nrf-iot_cloud
    • test-fw-nrfconnect-nrf-iot_libmodem-nrf
    • test-fw-nrfconnect-nrf-iot_mosh
    • test-fw-nrfconnect-nrf-iot_positioning
    • test-fw-nrfconnect-nrf-iot_samples
    • test-fw-nrfconnect-nrf-iot_serial_lte_modem
    • test-fw-nrfconnect-nrf-iot_thingy91
    • test-fw-nrfconnect-nrf-iot_zephyr_lwm2m
    • test-fw-nrfconnect-nrf_crypto
    • test-fw-nrfconnect-proprietary_esb
    • test-fw-nrfconnect-ps
    • test-fw-nrfconnect-rpc
    • test-fw-nrfconnect-rs
    • test-fw-nrfconnect-tfm
    • test-fw-nrfconnect-thread
    • test-low-level
    • test-sdk-audio
    • test-sdk-dfu
    • test-sdk-find-my
    • test-sdk-mcuboot
    • test-sdk-pmic-samples
    • test-sdk-wifi
    • test-secdom-samples-public

Note: This message is automatically posted and updated by the CI

@@ -938,7 +938,8 @@ Common Application Framework
Debug libraries
---------------

|no_changes_yet_note|
* Added an experimental :ref:`Zephyr Core Dump <zephyr:coredump>` backend that writes a core dump to an internal flash or RRAM partition.
To enable this backend set the :kconfig:option:`CONFIG_DEBUG_COREDUMP_BACKEND_OTHER` and :kconfig:option:`CONFIG_DEBUG_COREDUMP_BACKEND_NRF_FLASH_PARTITION` Kconfig options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To enable this backend set the :kconfig:option:`CONFIG_DEBUG_COREDUMP_BACKEND_OTHER` and :kconfig:option:`CONFIG_DEBUG_COREDUMP_BACKEND_NRF_FLASH_PARTITION` Kconfig options.
To enable this backend, set the :kconfig:option:`CONFIG_DEBUG_COREDUMP_BACKEND_OTHER` and :kconfig:option:`CONFIG_DEBUG_COREDUMP_BACKEND_NRF_FLASH_PARTITION` Kconfig options.

Copy link

github-actions bot commented Apr 4, 2025

You can find the documentation preview for this PR here.

Preview links for modified nRF Connect SDK documents:

https://ncsdoc.z6.web.core.windows.net/PR-21418/nrf/releases_and_maturity/releases/release-notes-changelog.html

Comment on lines 125 to 129
if ((offset % FLASH_WRITE_SIZE) != 0) {
write_error = -EINVAL;
return;
}

if (!is_within_partition(offset, size)) {
write_error = -ENOMEM;
return;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be asserts. There is nothing we can do at this point, there is no recovery and returning error from here is pointless, because the problem is with the caller not properly aligning buffers.
Tests here are just adding to code, while this is the last stand driver that should have minimal code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding alignment, I agree, though do you think it is OK to trigger another fatal error (assertion) while processing the previous one? Won't it cause a core dump loop? Maybe I can just remove this check and let driver layer underneath fail if the alignment is wrong?

Regarding is_within_partition, I think we should keep it even when assertions are disabled - one may simply allocate to small partition (which is easy to misconfigure) and we don't want to do out of bounds writes in such a case. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding alignment, I agree, though do you think it is OK to trigger another fatal error (assertion) while processing the previous one? Won't it cause a core dump loop? Maybe I can just remove this check and let driver layer underneath fail if the alignment is wrong?

Yeah, I think we can just rely here on driver failing anyway.

Regarding is_within_partition, I think we should keep it even when assertions are disabled - one may simply allocate to small partition (which is easy to misconfigure) and we don't want to do out of bounds writes in such a case. What do you think?

Let it be as it as is.

The perfect solution here would be if we could run build in "test configuration" mode, where we could catch all problems that are misconfiguration at compile time. Because, notice that you will not have info that partition is too small until your core dump does not fit.

@Damian-Nordic
Copy link
Contributor Author

@nordic-krch @nordicjm could you please review?

Add Zephyr core dump backend that saves a core dump to the
internal flash or RRAM partition named "coredump_partition".

This backend is an alternative to BACKEND_FLASH_PARTITION
provided by Zephyr but it bypasses Zephyr flash device layer
and uses nrfx directly, which offers the following benefits:

1. Bypasses synchronization primitives used by Zephyr flash
   or RRAM drivers. Currently, Zephyr flash drivers cannot
   be used to write to flash from a fault handler because
   of this.
2. Works with Partition Manager.
3. Minimizes the dependencies needed to successfully write
   a core dump, which is important as the core dump often
   needs to be written when the system is in the corrupted
   state.

Only flash and RRAM are supported for now (no MRAM support).

Signed-off-by: Damian Krolik <damian.krolik@nordicsemi.no>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required PR must not be merged without tech writer approval.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants