build: A high-level mechanism for embedding raw data from binary files #86204

hIAhmad · 2025-02-24T03:46:18Z

Adding CMake functions generate_binary_embed_inc_* to introduce a portable
high-level build mechanism for embedding raw binary data.

These functions enable including raw data from one or more binary files,
along with desired symbol names for corresponding data being embedded.
Controlling aspects such as address alignment, linker section, inclusion
as read-only (default) or as writable, data compression and including only
a subset of data from the specified binary files, is supported through
named optional parameters.

These functions provide a higher-level embedding mechanism compared to
generate_inc_* functions, enabling advanced use cases like:

Embedding of data from more than one files, where the list and number
of files being included is determined at build time, e.g. driven by
some device-tree fields or config options. generate_inc_* functions
cannot handle such use-cases without a fair amount of additional custom
support code added to corresponding CMake build files.
(See drivers: ramdisk: Add support for embedded disk images as backing store of RAM disks #82603
for an example use-case that will be greatly simplified, if this proposed
generic mechanism becomes available).
generate_inc_* functions embed binary data by first converting it
into C-source compilable form; which is fine for small binary files
but for larger files this can increase build time and resources usage
significantly. Whereas these new functions can transparently make use
of GNU-style "incbin" directive, when available (and usable), for much
faster inclusion of binary files, automatically falling back to using
generate_inc_* functions internally, for other cases.

Testing:

Tested on following platforms:

native_sim
qemu_x86
qemu_cortex_a9
qemu_cortex_a53
qemu_cortex_m3
qemu_cortex_r5
qemu_riscv32
qemu_riscv64
qemu_xtensa/dc233c
qemu_arc/qemu_arc_hs

Tested both with and without Zephyr CMake linker generator feature (on qemu_cortex_m3).

Related PRs:

This new feature is inspired by discussion under related drivers: ramdisk: Add support for embedded disk images as backing store of RAM disks #82603 (which will be rebased to use this generic feature, once approved).
Depends on ~~toolchain: Add assembly-side counterpart of __aligned #85758~~ and ~~scripts: build: Add optimizations in file2hex.py to gzip path as well #85900~~ to be merged first.

Note:
Changes to include/zephyr/toolchain/common.h and scripts/build/file2hex.py are not part of this PR - they are coming from the separate PRs that this PR depends on. I've added them here to get early feedback on rest of the feature changeset. These changes will be removed from this PR (through rebase) once the respective PRs get approved and merged to main.

include/zephyr/toolchain/common.h

nordicjm · 2025-02-26T08:13:26Z

cmake/gen_binary_embed_inc.cmake

+                             " */\n")
+
+if(USE_INCBIN)
+file(WRITE ${GENERATED_FILE}.S "/*\n"


2 space indent for cmake

nordicjm · 2025-02-26T08:28:07Z

cmake/modules/extensions.cmake

+#   ALIGN         - Alignment for data being embedded in bytes. Default is 4.
+#   SECTION       - Name of the desired linker section. Data from all specified
+#                   binary files will be placed in the given section name.
+#   ZIP           - Compress the data with specified compression method: gzip or no (default).


gzip != zip, and don't think that has a place here, if you need a zip then the zip file should be present and provided to the function (or compressed manually with a user build step then included)

Yes, ZIP as the name of the parameter is a bit arguable. While the word zip can sometimes be used as a generic word for compression it is also a name for a specific compression method. COMPRESS can be an alternative name here, that does not suffer from this ambiguity.

Now, regarding your other point - whether this parameter should even be provided in first place. It is true that a file can be compressed in a separate build step, prior to passing it on to generate_binary_embed_inc_file but its lower-level counterpart generate_inc_file already provides this capability so I think it is better to preserve the existing capability of that function in this higher-level wrapper (and it is certainly more convenient than having to do it in a separate build step).

I have renamed the ZIP parameter to COMPRESS.

nordicjm · 2025-02-26T08:36:09Z

cmake/modules/extensions.cmake

+#                   specified files should be started.
+#   LENGTH        - Length, in bytes, of the data subset to be included from
+#                   the specified files.
+#   READONLY      - yes (default) or no.


this should be a flag and should be CONST not READONLY

this should be a flag and should be CONST not READONLY

I am a bit divided on the name. I had thought about naming it CONST but then felt that the parameter name should better reflect the eventual logical effect that normal/intended use of the parameter should achieve, rather than an implementation detail so chose READONLY instead. Let's wait for some more feedback on the naming and then consider renaming it accordingly.

~~I'll make it a flag right away, though.~~ (see below: #86204 (comment)).

@nordicjm There is a downside to making this parameter a flag. By default, you'd typically include/embed binary data as read-only, and not as writable. So if we make this a flag, it will always have to be specified for the common and typical case. By making it take a value, you need to specify it only if you want to make the included data writable (which is relatively a rare case, currently, #82603 would be likely the only such practical use in the source tree).

I think it is better to have it take a value.

Alternatively, we can invert the logic here and instead of READONLY, make it WRITABLE. Then we can make it a flag without having to specify it for the common, default case.

Thoughts?

What I find problematic is that the only documentation for generate_binary_embed_inc_file_for_target() and friends is:

# READONLY - yes (default) or no.

This tells me nothing about what READONLY actually means, I need to go look at the script which generates the including code.

This needs to be improved, and in this context I believe using the flag / parameter name CONST will ease the understanding of the flag.

Adding CMake functions generate_binary_embed_inc_* to introduce a portable high-level build mechanism for embedding raw binary data. These functions enable including raw data from one or more binary files, along with desired symbol names for corresponding data being embedded. Controlling aspects such as address alignment, linker section, inclusion as read-only (default) or as writable, data compression and including only a subset of data from the specified binary files, is supported through named optional parameters. These functions provide a higher-level embedding mechanism compared to generate_inc_* functions, enabling advanced use cases like: - Embedding of data from more than one files, where the list and number of files being included is determined at build time, e.g. driven by some device-tree fields or config options. generate_inc_* functions cannot handle such use-cases without a fair amount of additional custom support code added to corresponding CMake build files. - generate_inc_* functions embed binary data by first converting it into C-source compilable form; which is fine for small binary files but for larger files this can increase build time and resources usage significantly. Whereas these new functions can transparently make use of GNU-style "incbin" directive, when available (and usable), for much faster inclusion of binary files, automatically falling back to using generate_inc_* functions internally, for other cases. Signed-off-by: Irfan Ahmad <irfan.ahmad@siemens.com>

Add a test that exercises generate_binary_embed_inc_* CMake functions to embed raw binary data from two test binary files, into an application, in a comprehensive set of permutations of supported parameters. Signed-off-by: Irfan Ahmad <irfan.ahmad@siemens.com>

hIAhmad · 2025-03-24T18:34:11Z

@tejlmand, @nashif, @nordicjm The dependencies of this PR are now out of the way (approved and merged). It would be great if you can spare some time to review/re-review this PR.

tejlmand

Thanks for the improved version.

Still a little in doubt on the .incbin, but that's for another discussion.
First round reviewing the implementation itself.

tejlmand · 2025-04-07T09:30:03Z

cmake/modules/extensions.cmake

+# Required named parameters:
+#
+#   OUTPUT        - The generated file.
+#   BINARY_FILES  - List of binary files to be converted into embeddable form.


Perhaps just files. The function name, function and parameter description already make it clear that the function works with binary files

Suggested change

# BINARY_FILES - List of binary files to be converted into embeddable form.

# FILES - List of binary files to be converted into embeddable form.

Note: not blocking comment.

tejlmand · 2025-04-07T09:35:40Z

cmake/modules/extensions.cmake

+#   OFFSET        - Offset, in bytes, from which data inclusion from the
+#                   specified files should be started.
+#   LENGTH        - Length, in bytes, of the data subset to be included from
+#                   the specified files.


This could be too limited.
If wanting to support multiple input files, then it should also be possible to have different offset or length from those files.

It might be that your current use-case will be happy with using same offset or length for all files in a list, but this can easily be too limiting in future.

tejlmand · 2025-04-07T09:49:41Z

cmake/gen_binary_embed_inc.cmake

+                                  "#include <zephyr/linker/sections.h>\n")
+endif()
+
+if(${READONLY} STREQUAL yes)


Just do:

Suggested change

if(${READONLY} STREQUAL yes)

if(READONLY)

Note: comment on naming is discussed else-where.

tejlmand · 2025-04-07T09:53:12Z

cmake/modules/extensions.cmake

+#                   specified files should be started.
+#   LENGTH        - Length, in bytes, of the data subset to be included from
+#                   the specified files.
+#   READONLY      - yes (default) or no.


What I find problematic is that the only documentation for generate_binary_embed_inc_file_for_target() and friends is:

# READONLY - yes (default) or no.

This tells me nothing about what READONLY actually means, I need to go look at the script which generates the including code.

This needs to be improved, and in this context I believe using the flag / parameter name CONST will ease the understanding of the flag.

tejlmand · 2025-04-07T09:54:03Z

cmake/modules/extensions.cmake

+#   ALIGN         - Alignment for data being embedded in bytes. Default is 4.
+#   SECTION       - Name of the desired linker section. Data from all specified
+#                   binary files will be placed in the given section name.
+#   COMPRESS      - Compress the data with specified compression method: gzip or no (default).


Please using none or uncompressed.

tejlmand · 2025-04-07T10:45:16Z

cmake/modules/extensions.cmake

+  set(generated_files ${generated_file})
+  if(use_incbin)
+    list(APPEND generated_files ${generated_file}.S)
+    zephyr_library_sources(${generated_file}.S)


Why is the generated assemble file added to the current Zephyr library, even when dedicated target is specified ?

tejlmand · 2025-04-07T10:47:03Z

cmake/gen_binary_embed_inc.cmake

+  else()
+    set(bin_file_size ${LENGTH})
+  endif()
+  if(${COMPRESS} STREQUAL no)


Suggested change

if(${COMPRESS} STREQUAL no)

if(COMPRESS)

and then swap the if and else bodies.

tejlmand · 2025-04-07T10:47:37Z

cmake/gen_binary_embed_inc.cmake

+  # Include the binary data.
+  if(USE_INCBIN)
+    if("${SECTION}" STREQUAL "")
+      if(${READONLY} STREQUAL yes)


Suggested change

if(${READONLY} STREQUAL yes)

if(READONLY)

tejlmand · 2025-04-07T10:51:27Z

cmake/gen_binary_embed_inc.cmake

+
+  # Include the binary data.
+  if(USE_INCBIN)
+    if("${SECTION}" STREQUAL "")


Seems that the use of READONLY in generate_binary_embed_inc() not only effects the use of the const qualifier but also the section placement (unless SECTION is also used).

However this is not documented anywhere afaict.

tejlmand · 2025-04-07T10:52:14Z

tests/application_development/gen_binary_embed_inc_file/CMakeLists.txt

Nice with tests 👍

hIAhmad force-pushed the gen_binary_embed_inc_file branch 3 times, most recently from 33db449 to 7e400f1 Compare February 25, 2025 17:41

hIAhmad mentioned this pull request Feb 25, 2025

scripts: build: Add support for generating string literals in file2hex.py #84116

Merged

hIAhmad marked this pull request as ready for review February 26, 2025 07:01

zephyrbot added area: Toolchains Toolchains area: Build System labels Feb 26, 2025

zephyrbot requested review from 57300, jeremybettis, nashif, nordicjm, stephanosio and tejlmand February 26, 2025 07:02

zephyrbot assigned tejlmand Feb 26, 2025

nordicjm reviewed Feb 26, 2025

View reviewed changes

hIAhmad force-pushed the gen_binary_embed_inc_file branch 2 times, most recently from ac3355f to afd4afb Compare February 26, 2025 10:31

hIAhmad force-pushed the gen_binary_embed_inc_file branch from afd4afb to 2315b09 Compare March 7, 2025 19:14

hIAhmad added 2 commits March 24, 2025 23:09

hIAhmad force-pushed the gen_binary_embed_inc_file branch from 2315b09 to eae0319 Compare March 24, 2025 18:10

57300 removed their request for review April 1, 2025 19:02

tejlmand requested changes Apr 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: A high-level mechanism for embedding raw data from binary files #86204

build: A high-level mechanism for embedding raw data from binary files #86204

hIAhmad commented Feb 24, 2025 •

edited

Loading

nordicjm Feb 26, 2025

hIAhmad Feb 26, 2025

nordicjm Feb 26, 2025

hIAhmad Feb 26, 2025 •

edited

Loading

hIAhmad Feb 26, 2025

nordicjm Feb 26, 2025

hIAhmad Feb 26, 2025 •

edited

Loading

hIAhmad Feb 26, 2025 •

edited

Loading

tejlmand Apr 7, 2025

hIAhmad commented Mar 24, 2025

tejlmand left a comment

tejlmand Apr 7, 2025

tejlmand Apr 7, 2025

tejlmand Apr 7, 2025

tejlmand Apr 7, 2025

tejlmand Apr 7, 2025

tejlmand Apr 7, 2025

tejlmand Apr 7, 2025

tejlmand Apr 7, 2025

tejlmand Apr 7, 2025

tejlmand Apr 7, 2025

	# BINARY_FILES - List of binary files to be converted into embeddable form.
	# FILES - List of binary files to be converted into embeddable form.

build: A high-level mechanism for embedding raw data from binary files #86204

Are you sure you want to change the base?

build: A high-level mechanism for embedding raw data from binary files #86204

Conversation

hIAhmad commented Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hIAhmad Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hIAhmad Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

hIAhmad Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hIAhmad commented Mar 24, 2025

tejlmand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hIAhmad commented Feb 24, 2025 •

edited

Loading

hIAhmad Feb 26, 2025 •

edited

Loading

hIAhmad Feb 26, 2025 •

edited

Loading

hIAhmad Feb 26, 2025 •

edited

Loading