Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linker: Allow for 999 priority levels in init levels #86999

Merged
merged 4 commits into from
Mar 19, 2025

Conversation

dewitt-garmin
Copy link
Contributor

Some projects may have needs for more than 99 priority levels, so add a third linker input section for each obj level.

Copy link

Hello @dewitt-garmin, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

@cfriedt
Copy link
Member

cfriedt commented Mar 12, 2025

Wow - that's a lot of priority levels.

If I may ask a couple of questions

  1. What init level are they being used in? (E.g. EARLY, APPLICATION, ..)
  2. What are the initialization routines doing (approximately)?

The reason I ask, is that we generally want to avoid misusing sys_init to initialize drivers. The device model should really be used instead.

Personally, I have no problem extending the number of levels, but it might not be useful for the average Zephyr user.

Also, there may be some implications to existing services if this lands based on assumed highest / lowest priorities.

@dewitt-garmin
Copy link
Contributor Author

Hi @cfriedt,

Thank you for the quick response. I agree it is a lot of potential levels, but 100 does not seem out of reach for how many modules we may need to initialize. Further we may wish to leave gaps between different modules to allow for future expansion without having to bump all of the priorities to add a new module.

To answer your questions:

  1. Most of the init functions are run in the POST_KERNEL or APPLICATION.
  2. The routines are setting up basic functionality for our modules like initializing mutexes and setting up global state based on settings and the file system. Many of them are mid- or high-level code that do not have a corresponding hardware device that would make sense to initialize with. They used to return void and take a void pointer but we are modifying them to return an integer of a constant 0 to comply with the init_function pointer. In the end the device model would use the same init_entry and have the same limitation of 99 priorities without this change.

I hear your concerns about this being a lot of levels, but it is more just ensuring that we have enough priorities if required. Ideally we get away with 99 but don't want to have a last-minute scramble if we need more. Please let me know if you have any other questions.

Side note: I did not see any test infrastructure for SYS_INIT to modify for this change and did not see anywhere that we tested that 99 was achievable. I can add it somewhere but wasn't sure the most appropriate place, but this also seemed like a fairly straightforward code and documentation change. This also begs the question on if we are allowing 999, should we go for gold and go even larger? There's really no runtime cost to supporting more, just more input sections in the linker script.

Thanks,
Josh DeWitt

@carlescufi
Copy link
Member

Thanks for the PR! Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request.

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's no particular reason to reject this, it's cleanly implemented and harmless.

But yeah: my gut says this is probably moving in the wrong direction and that we should be looking at SYS_INIT() functions as a legacy thing and aiming innovation at the device model instead, where we can do things like track dependencies instead of just throw darts into a sparse number space of priorities.

@dewitt-garmin
Copy link
Contributor Author

Hi @andyross ,

Can you help me understand how device model solves this problem? Under the hood it uses the same linker input sections, currently limited to 99 priority levels. I'm also not sure how it would be usable for software-only modules that rely on other software modules that are not represented in device tree. It would also seem to use more ROM to keep track of the dependencies whereas this may be more tedious to program but resolves to an array of function pointers by the linker script. Not saying we couldn't benefit from using the device model in more places, I just am not seeing it to be a 1-1 replacement without any tradeoffs. Perhaps I am missing something so please let me know if my understanding is incorrect.

Thanks,
Josh DeWitt

@cfriedt
Copy link
Member

cfriedt commented Mar 13, 2025

@dewitt-garmin - are you runtime-loading elf modules? If so, it might be possible automatically initialize all k-objects.

@dewitt-garmin
Copy link
Contributor Author

Hi @cfriedt,

No, we are not runtime-loading elf modules at this time on this product. We are initializing many different mid- to high-level software modules that we use in all of our standard products, whether they run on Zephyr or not.

Thanks,
Josh DeWitt

@cfriedt
Copy link
Member

cfriedt commented Mar 13, 2025

@dewitt-garmin - in that case, it would really make sense to use K_MUTEX_DEFINE() and skip runtime calls to k_mutex_init() unless there is something I've misunderstood.

https://docs.zephyrproject.org/latest/doxygen/html/group__mutex__apis.html#gab6f3d98fabbdc0918bbc9934d61d63f3

Additional global state would be better to set up in a similar fashion where it's set at compile time and then initialized with the C runtime.

Naturally, there are other potential initialization steps that are more procedural, which would require a SYS_INIT(). This gets tricky because it requires manual separation into separate priorities but only for dependent steps. Independent steps can be initialized using a common priority.

@dewitt-garmin
Copy link
Contributor Author

Hi @cfriedt,

We are dealing with a lot of legacy code that runs on several different platforms, not just zephyr, so we cannot change them to work just for zephyr. I am hoping we can squash these initializations into fewer priorities long term, but we also don't want to wait until we have 98 priorities in use and then come asking for change in a rush out of desperation. So while I hope we won't need more than 100 priorities, I think it is good to have this option on the table just in case.

I don't think there are really any negative impacts to having other than maybe encouraging potentially bad practices, but that would be owned by the people making an application. I say potentially bad because for zephyr code there may be better ways, but for legacy shared code, this might be the easiest path. Am I understanding correctly?

Thanks,
Josh DeWitt

@cfriedt
Copy link
Member

cfriedt commented Mar 13, 2025

Ok - thanks for the details. It helps for clarification.

I have no major objections but it would be good to hear from @gmarull .

Copy link
Member

@gmarull gmarull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections. I'd recommend not to build too much on top of SYS_INIT, though.

Please update zephyr_linker_section_obj_level in cmake/modules/extensions.cmake

@zephyrbot zephyrbot requested a review from tejlmand March 14, 2025 19:43
@dewitt-garmin
Copy link
Contributor Author

Hi @gmarull,

Thanks for the pointer. I found a few more places to update while working on e29c9c9. I can omit that patch if everyone prefer, but it did make things easier to read in the final linker scripts and map files when working on this change.

Also I tried using _CONCAT_5 in linker-defs.h to fix SPACING: spaces required around that '?' (ctx:VxO) but since I can't use double quotes it still showed the error. Do we just ignore it?

Thanks,
Josh DeWitt

@dewitt-garmin dewitt-garmin requested a review from gmarull March 17, 2025 13:45
Copy link
Member

@gmarull gmarull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first commit contains unrelated changes mixed, please, split

@gmarull gmarull requested a review from fabiobaltieri March 17, 2025 13:51
@dewitt-garmin
Copy link
Contributor Author

Hi @gmarull,

first commit contains unrelated changes mixed, please, split

That change is to prepare for the other changes though? We require a completely separate pull request for that minor cleanup? What is your thought on e29c9c9? Should that be a separate pull request too?

Thanks,
Josh DeWitt

@gmarull
Copy link
Member

gmarull commented Mar 17, 2025

Hi @gmarull,

first commit contains unrelated changes mixed, please, split

That change is to prepare for the other changes though? We require a completely separate pull request for that minor cleanup? What is your thought on e29c9c9? Should that be a separate pull request too?

Thanks, Josh DeWitt

Nope, just split commit

@dewitt-garmin
Copy link
Contributor Author

Hi @gmarull,

You mean the test change versus the linker clang format change? Sure, I can split up that commit.
Or are you saying I need a separate pull request?

Thanks,
Josh DeWitt

clang-format will remove the space after the equals sign causing link
issues. Disable clang-format for some linker macros since they should
not be formatted as C code.

Signed-off-by: Josh DeWitt <josh.dewitt@garmin.com>
Ensure the priority is surrounded by underscores for clarity. This makes
sections show up as
z_init_PRE_KERNEL_1_0_0_ instead of
z_init_PRE_KERNEL_10_0_

Signed-off-by: Josh DeWitt <josh.dewitt@garmin.com>
Add a test to ensure that the max expected 2-digit SYS init priority
works.

Signed-off-by: Josh DeWitt <josh.dewitt@garmin.com>
Copy link
Member

@fabiobaltieri fabiobaltieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, feels excessive but I guess you have a use case for it and there's no drawbacks so I guess why not. Also thanks for taking the time to add that underscore to the object symbol file.

Compliance is finding some issues, I think it's good if you can fix those, right now since it's your first PR someone has to approve the workflow but you can also run it locally with ./scripts/ci/check_compliance.py.

@dewitt-garmin
Copy link
Contributor Author

Hi @fabiobaltieri,

Compliance is finding some issues, I think it's good if you can fix those, right now since it's your first PR someone has to approve the workflow but you can also run it locally with ./scripts/ci/check_compliance.py.

Did you see my reply here? I do not know how to make it pass check_compliance.py since the current code fails and there does not seem to be a way to satisfy it. Is there an inline suppression I can do similar to clang-format?

Also I tried using _CONCAT_5 in linker-defs.h to fix SPACING: spaces required around that '?' (ctx:VxO) but since I can't use double quotes it still showed the error. Do we just ignore it?

Thanks,
Josh DeWitt

Some projects may have needs for more than 99 priority levels, so add
a third linker input section for each obj level.

Signed-off-by: Josh DeWitt <josh.dewitt@garmin.com>
@fabiobaltieri
Copy link
Member

Did you see my reply here? I do not know how to make it pass check_compliance.py since the current code fails and there does not seem to be a way to satisfy it. Is there an inline suppression I can do similar to clang-format?

Sorry missed that. If it's a false positive then just let it fail, we can ask the org admin to bypass the check and merge it.

@dewitt-garmin dewitt-garmin requested a review from gmarull March 19, 2025 17:31
@nashif nashif merged commit 0ae0c3d into zephyrproject-rtos:main Mar 19, 2025
21 of 22 checks passed
Copy link

Hi @dewitt-garmin!
Congratulations on getting your very first Zephyr pull request merged 🎉🥳. This is a fantastic achievement, and we're thrilled to have you as part of our community!

To celebrate this milestone and showcase your contribution, we'd love to award you the Zephyr Technical Contributor badge. If you're interested, please claim your badge by filling out this form: Claim Your Zephyr Badge.

Thank you for your valuable input, and we look forward to seeing more of your contributions in the future! 🪁

@GoranWall
Copy link

Unfortunately this completely broke linking for IAR. It would really help if there were some tests run with IAR.

@RobinKastberg
Copy link
Contributor

Looks like the wildcards are ambigous?

If this is due to CMAKE_LINKER_GENERATOR or affects LD silently as well I will need to be looked into

Error[Lc037]: ambiguous section match: "ro data section .z_init_PRE_KERNEL_2_0_0_ in cortex_m_systick.c.o" matches more than one pattern
            "section .z_init_PRE_KERNEL_2_?_*" (at line 41 of "/workdir/zephyr/twister-out/qemu_cortex_m3_ti_lm3s6965/iar/tests/kernel/semaphore/semaphore/kernel.semaphore/zephyr/linker_zephyr_pre0.cmd")
            "section .z_init_PRE_KERNEL_2_???_*" (at line 43 of "/workdir/zephyr/twister-out/qemu_cortex_m3_ti_lm3s6965/iar/tests/kernel/semaphore/semaphore/kernel.semaphore/zephyr/linker_zephyr_pre0.cmd")

I am not too well versed in this to propose a hotfix. Let's revert? @nashif @kartben ?

@fabiobaltieri
Copy link
Member

I am not too well versed in this to propose a hotfix. Let's revert? @nashif @kartben ?

Not much point reverting if the author (or the bulk of the community) has no access to the tooling to fix it, until that's the case I think these should be forward fixed, there were some discussions about this in #63738.

@RobinKastberg
Copy link
Contributor

@fabiobaltieri sure thing I appreciate that we haven't had a satisfactory solution here for the proprietary toolchains.
I see no problems in forward changing for fixing IAR specific problems,

but the fact that
.z_init_PRE_KERNEL_2_0_0_ is matched by both
.z_init_PRE_KERNEL_2_?_*
and
.z_init_PRE_KERNEL_2_???_* does not require any particular toolchain to understand and
sounds like a fundamental problem with this PR, that would cause subtle silent reordering problems also on ld?

@fabiobaltieri
Copy link
Member

fabiobaltieri commented Mar 20, 2025

but the fact that .z_init_PRE_KERNEL_2_0_0_ is matched by both .z_init_PRE_KERNEL_2_?_* and .z_init_PRE_KERNEL_2_???_* does not require any particular toolchain to understand and sounds like a fundamental problem with this PR, that would cause subtle silent reordering problems also on ld?

Oh ok, I missed the details about it and somehow though it was only impacting the IAR linker, yes then I think this justifies a rollback, maybe just do the last two commits, I think the first two can stay.

Unless we can come up with a hotfix of course, wouldn't it be enough to disambiguate the quialifier? like adding an extra underscore qualifier between the last two digits? Since apparently your linker can detect that and LD can't (...) maybe you could try that out and propose a fix forward like that? And have a go while the rollback CI runs, that takes ~1h anyway so we have some time to propose an alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.