Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: timer : cortex_m_systick MAX_TICKS protection #83450

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hakehuang
Copy link
Collaborator

@hakehuang hakehuang commented Dec 30, 2024

when CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC set to 960M and CONFIG_SYS_CLOCK_TICKS_PER_SEC set to 100
the MAX_TICKS will be zero or even a negative value, which is not expected, at this time the COUNTER_MAX is
so need add a protection here

fixes: #36766

there used to be a workaround, not a fix,

either change the CONFIG_SYS_CLOCK_TICKS_PER_SEC=200 or
CONFIG_PM to set the CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC to 32678

@hakehuang
Copy link
Collaborator Author

test on mimxrt1170_evk without default sys_tick settings

*** Booting Zephyr OS build v4.0.0-2778-g1b70bfa14e28 ***
Running TESTSUITE tickless_concept
===================================================================
START - test_tickless_slice
elapsed slice 110, expected: <100, 110>
elapsed slice 100, expected: <100, 110>
elapsed slice 100, expected: <100, 110>
elapsed slice 100, expected: <100, 110>
 PASS - test_tickless_slice in 0.607 seconds
===================================================================
START - test_tickless_sysclock
time 640, 850
time 860, 1060
 PASS - test_tickless_sysclock in 0.426 seconds
===================================================================
TESTSUITE tickless_concept succeeded

------ TESTSUITE SUMMARY START ------

SUITE PASS - 100.00% [tickless_concept]: pass = 2, fail = 0, skip = 0, total = 2 duration = 1.033 seconds
 - PASS - [tickless_concept.test_tickless_slice] duration = 0.607 seconds
 - PASS - [tickless_concept.test_tickless_sysclock] duration = 0.426 seconds

------ TESTSUITE SUMMARY END ------

===================================================================
PROJECT EXECUTION SUCCESSFUL

@hakehuang hakehuang force-pushed the sys_tick_max_count_protection branch from 034f78b to ceacafd Compare December 30, 2024 11:43
#define MAX_TICKS ((k_ticks_t)(COUNTER_MAX / CYC_PER_TICK) - 1)
/* add MAX_TICKS protection */
#define _COUNTER_MAX ((COUNTER_MAX > (2 * CYC_PER_TICK)) ? COUNTER_MAX : (2 * CYC_PER_TICK))
#define MAX_TICKS ((k_ticks_t)(_COUNTER_MAX / CYC_PER_TICK) - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems clumsy. Can't you add a BUILD_ASSERT() in 64 bit precision instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_assert only assert during building, it does not resolve the problem, so if we want to CONFIG_SYS_CLOCK_TICKS_PER_SEC to be 100 and still running in that high frequency(960M). let me try to find a better way to implement this.

@gramsay0
Copy link
Collaborator

@hakehuang here's an old attempt at this that may be a useful reference: 5cbf8d8 #66601

@hakehuang
Copy link
Collaborator Author

hakehuang commented Dec 31, 2024

@hakehuang here's an old attempt at this that may be a useful reference: 5cbf8d8 #66601

I guess the pr attempts to assert the setting in return we still need to increate the CONFIG_SYS_CLOCK_TICKS_PER_SEC to avoid this issue, and my attempt to still use the high frequency, but less tick, by downgrading the accuracy to its up-limits

@hakehuang hakehuang force-pushed the sys_tick_max_count_protection branch 2 times, most recently from 7622a6b to 282f2b2 Compare December 31, 2024 04:38
@hakehuang hakehuang requested a review from andyross December 31, 2024 09:01
Copy link
Collaborator

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driving back by. Seems like the real bug here is that "- 1" subexpression that's rolling over when the quotient is zero. Why is that there? It's not mathematically correct. Is it to handle latency slop in the reset code maybe?

Nonetheless if you want the expression to be safe, it's IMHO more clearly expressed as something like this, which clamps the quotient before the subtraction to guarantee no rollover:

#define MAX_TICKS CLAMP(COUNTER_MAX / CYC_PER_TICK, 1, COUNTER_MAX) - 1

But your version seems fine too. I'm just concerned about that term and where it came from.

@andyross
Copy link
Collaborator

Heh, it was my doing, back in 2018 during the timer rewrites: see commit c535300

I think it's just been carried along voodoo-style ever since. It's harmless (except for the possibility of rollover at high clock rates), but I really don't think it's needed. Might be worth spinning a version that simply removes that subtraction and check that nothing breaks.

Basically I think the root cause here is a bug I wrote six years ago. :)

@andyross
Copy link
Collaborator

Also, just to be clear, the configuration being managed here is pathological: the clock rates is so high and the tick rate so low that it's not possible to deliver interrupts that slowly from the paltry 24 bit SysTick counter. Such a configuration basically has no control over timer interrupt frequency and would be much better served with a regular tick interrupt (which is what it's getting anyway) that avoids all the runtime complexity.

@hakehuang
Copy link
Collaborator Author

Also, just to be clear, the configuration being managed here is pathological: the clock rates is so high and the tick rate so low that it's not possible to deliver interrupts that slowly from the paltry 24 bit SysTick counter. Such a configuration basically has no control over timer interrupt frequency and would be much better served with a regular tick interrupt (which is what it's getting anyway) that avoids all the runtime complexity.

@andyross will you propose a consolidate solution? my fix is to use the maximum of the 24bit counter if the original setting exceed. I can close my PR, if so. Thanks for checking.

andyross
andyross previously approved these changes Jan 3, 2025
Copy link
Collaborator

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I don't think there's anything wrong with this fix. It's just that (1) it's working around a rollover condition that shouldn't have been there to begin with, just to (2) handle a pathological case where "the timer can't even count as high as one tick".

I don't know the Right solution there, but I guess in my head it would look nicer as a BUILD_ASSERT() that produces a "Tickless does nothing here, use a ticked kernel" error or something.

@hakehuang
Copy link
Collaborator Author

(2) handle a pathological case where "the timer can't even count as high as one tick".

I don't know the Right solution there, but I guess in my head it would look nicer as a BUILD_ASSERT() that produces a "Tickless does nothing here, use a ticked kernel" error or something.

I add a message to show this, as warning will break CI.

@hakehuang
Copy link
Collaborator Author

@andyross please help to review again, thanks.

andyross
andyross previously approved these changes Jan 7, 2025
Copy link
Collaborator

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great here.

@hakehuang
Copy link
Collaborator Author

@teburd please help to check thanks

@hakehuang
Copy link
Collaborator Author

@teburd please help to review

@golowanow
Copy link
Member

@hakehuang - FYI #84051 (comment)

@hakehuang
Copy link
Collaborator Author

@hakehuang - FYI #84051 (comment)

does this pr fix your issue?

@hakehuang
Copy link
Collaborator Author

@teburd can you help to review this? thanks

@golowanow
Copy link
Member

@hakehuang - FYI #84051 (comment)

does this pr fix your issue?

together with #84050 they resolve both overflows and timing instability like these:

Time Measurements for multiq sched queues
Timing results: Clock frequency: 25 MHz
REC: sched.add.thread.ReadyQ_tail.min         - Add threads of decreasing priority, min.           :     387 cycles ,   15480 ns :
REC: sched.add.thread.ReadyQ_tail.max         - Add threads of decreasing priority, max.           : 4278577 cycles , 171143080 ns :
REC: sched.add.thread.ReadyQ_tail.avg         - Add threads of decreasing priority, avg.           :  384082 cycles , 15363280 ns :
REC: sched.add.thread.ReadyQ_tail.stddev      - Add threads of decreasing priority, stddev.        : 1220073 cycles , 48802920 ns :
ReadyQ.add.to.tail.0000.waiters          - Add thread of priority (0)     : 4261800 cycles , 170472023 ns
ReadyQ.add.to.tail.0001.waiters          - Add thread of priority (0)     :     387 cycles ,    15486 ns
ReadyQ.add.to.tail.0002.waiters          - Add thread of priority (0)     :     387 cycles ,    15490 ns
ReadyQ.add.to.tail.0003.waiters          - Add thread of priority (0)     :     387 cycles ,    15490 ns
ReadyQ.add.to.tail.0004.waiters          - Add thread of priority (0)     : 4261800 cycles , 170472000 ns
ReadyQ.add.to.tail.0005.waiters          - Add thread of priority (0)     :     387 cycles ,    15486 ns
ReadyQ.add.to.tail.0006.waiters          - Add thread of priority (0)     :     387 cycles ,    15486 ns

@hakehuang
Copy link
Collaborator Author

@hakehuang - FYI #84051 (comment)

does this pr fix your issue?

together with #84050 they resolve both overflows and timing instability like these:

Time Measurements for multiq sched queues
Timing results: Clock frequency: 25 MHz
REC: sched.add.thread.ReadyQ_tail.min         - Add threads of decreasing priority, min.           :     387 cycles ,   15480 ns :
REC: sched.add.thread.ReadyQ_tail.max         - Add threads of decreasing priority, max.           : 4278577 cycles , 171143080 ns :
REC: sched.add.thread.ReadyQ_tail.avg         - Add threads of decreasing priority, avg.           :  384082 cycles , 15363280 ns :
REC: sched.add.thread.ReadyQ_tail.stddev      - Add threads of decreasing priority, stddev.        : 1220073 cycles , 48802920 ns :
ReadyQ.add.to.tail.0000.waiters          - Add thread of priority (0)     : 4261800 cycles , 170472023 ns
ReadyQ.add.to.tail.0001.waiters          - Add thread of priority (0)     :     387 cycles ,    15486 ns
ReadyQ.add.to.tail.0002.waiters          - Add thread of priority (0)     :     387 cycles ,    15490 ns
ReadyQ.add.to.tail.0003.waiters          - Add thread of priority (0)     :     387 cycles ,    15490 ns
ReadyQ.add.to.tail.0004.waiters          - Add thread of priority (0)     : 4261800 cycles , 170472000 ns
ReadyQ.add.to.tail.0005.waiters          - Add thread of priority (0)     :     387 cycles ,    15486 ns
ReadyQ.add.to.tail.0006.waiters          - Add thread of priority (0)     :     387 cycles ,    15486 ns

oh, that's a different issue, the max_tick can not be zero which is this PR to resolve

@hakehuang hakehuang requested review from golowanow and teburd and removed request for teburd January 21, 2025 03:16
@hakehuang
Copy link
Collaborator Author

@golowanow can you help to review and kindly approve this? @teburd seems not interest this area anymore

when CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC set to 960M
and CONFIG_SYS_CLOCK_TICKS_PER_SEC set to 100
the MAX_TICKS will be zero or even negative value, which is not
expected.
so need add a protection here downgrading the accuracy to
its as high as possible

also add build message to show that tickless has no effect

fixes: zephyrproject-rtos#36766

there used to be a workaround, not a fix,
either change the CONFIG_SYS_CLOCK_TICKS_PER_SEC=200
or
CONFIG_PM to set the CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC
to 32678

Signed-off-by: Hake Huang <hake.huang@oss.nxp.com>
@hakehuang
Copy link
Collaborator Author

@andyross , @golowanow please help to review and kindly approve. Thanks

@hakehuang hakehuang removed the request for review from teburd February 5, 2025 06:13
@hakehuang
Copy link
Collaborator Author

@andyross can you help to review again. just a small change on 32 bit compatible. Thanks

@hakehuang
Copy link
Collaborator Author

@andyross please help to review

@decsny decsny removed their request for review February 17, 2025 18:09
@hakehuang
Copy link
Collaborator Author

hakehuang commented Feb 19, 2025

@andyross please help to check, no big change as last time. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tests-ci :kernel.tickless.concept.tickless_slice : test failed
5 participants