Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native_sim] Zephyr 4.1 breaks GPIO_EMUL_SDL and INPUT_SDL_TOUCH #87519

Open
PetervdPerk-NXP opened this issue Mar 23, 2025 · 27 comments
Open

[native_sim] Zephyr 4.1 breaks GPIO_EMUL_SDL and INPUT_SDL_TOUCH #87519

PetervdPerk-NXP opened this issue Mar 23, 2025 · 27 comments
Assignees
Labels
area: Input Input Subsystem and Drivers area: native port Host native arch port (native_sim) bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Regression Something, which was working, does not anymore

Comments

@PetervdPerk-NXP
Copy link
Collaborator

Describe the bug

I've got this application that runs natively on the NXP FRDM-MCXN947 but also has a native/posix/64 target.
https://github.com/NXPHoverGames/Doom-MCX
When upgrading from Zephyr 4.0.0 to Zephyr 4.1.0 the input (GPIO SDL emulation and SDL touch) for native/posix/64 is not working anymore.

To Reproduce

# initialize  for the example-application (main branch)
west init -m https://github.com/nxphovergames/doom-mcx --mr main doom-mcx-workspace
# update Zephyr modules
cd doom-mcx-workspace
west update
cd doom-mcx
west build -p always -b native_sim/native/64
./build/zephyr/zephyr.exe

When modifying west.yml to point to Zephyr v4.0.0 8469084dfae85f854555f0607f2c838dad097235
Then input is working again

Impact
No SDL input on native_sim

Environment (please complete the following information):

  • OS: (Ubuntu 22.04 under wsl2)
  • Toolchain (Zephyr sdk 0.17.0)
@PetervdPerk-NXP PetervdPerk-NXP added the bug The issue is a bug, or the PR is fixing a bug label Mar 23, 2025
@faxe1008
Copy link
Collaborator

Does it work with CONFIG_INPUT=y?

@PetervdPerk-NXP
Copy link
Collaborator Author

CONFIG_INPUT=y already, see

Image

Again the same code is working fine on native hardware.

@henrikbrixandersen henrikbrixandersen added the area: native port Host native arch port (native_sim) label Mar 23, 2025
@aescolar
Copy link
Member

CC @fabiobaltieri @pdgendt

@aescolar aescolar added the area: Input Input Subsystem and Drivers label Mar 24, 2025
@aescolar aescolar assigned fabiobaltieri and unassigned aescolar Mar 24, 2025
@fabiobaltieri
Copy link
Member

Can you bisect this?

@PetervdPerk-NXP
Copy link
Collaborator Author

PetervdPerk-NXP commented Mar 24, 2025

Can you bisect this?

I think I've narrowed it down this PR #81184
The commit 86a126d before is working fine.

EDIT: Reverting 02d562e on top Zephyr 4.1 gives me input again.

@JarmouniA
Copy link
Collaborator

FYI @Finomnis

@fabiobaltieri fabiobaltieri added the Regression Something, which was working, does not anymore label Mar 24, 2025
@Finomnis
Copy link
Contributor

Huh? I didn't change anything regarding the input, at least not that I know of...

Very weird.

I'm sadly very limited on time right now, would someone else mind dissecting that change in more detail to see which line exactly broke it?

@faxe1008
Copy link
Collaborator

faxe1008 commented Mar 24, 2025

@PetervdPerk-NXP your code does not build for me:

/tmp/doom-mcx-workspace/doom-mcx/prboom2/src/hu_stuff.c: In function ‘HU_Start’:
/tmp/doom-mcx-workspace/doom-mcx/prboom2/src/hu_stuff.c:372:19: error: passing argument 5 of ‘HUlib_initSText’ from incompatible pointer type [-Wincompatible-pointer-types]
  372 |                 _g->hu_font,
      |                 ~~^~~~~~~~~
      |                   |
      |                   const patch_t **
In file included from /tmp/doom-mcx-workspace/doom-mcx/prboom2/src/hu_stuff.c:39:
/tmp/doom-mcx-workspace/doom-mcx/prboom2/src/hu_lib.h:175:18: note: expected ‘const patch_t *’ but argument is of type ‘const patch_t **’
  175 |   const patch_t *font,
      |   ~~~~~~~~~~~~~~~^~~~
/tmp/doom-mcx-workspace/doom-mcx/prboom2/src/hu_stuff.c:384:19: error: passing argument 4 of ‘HUlib_initTextLine’ from incompatible pointer type [-Wincompatible-pointer-types]
  384 |                 _g->hu_font,
      |                 ~~^~~~~~~~~
      |                   |
      |                   const patch_t **
/tmp/doom-mcx-workspace/doom-mcx/prboom2/src/hu_lib.h:152:18: note: expected ‘const patch_t *’ but argument is of type ‘const patch_t **’
  152 |   const patch_t *f,
      |   ~~~~~~~~~~~~~~~^

Is this somehow reproducible with an intree-sample?

EDIT:// Hmm looking at the resulting autoconf.h tho it seems like CONFIG_LV_Z_POINTER_INPUT is not enabled for me, which might be the culprit. Can you check this on your end with a working build @PetervdPerk-NXP ?

EDIT-2:// @PetervdPerk-NXP try adding CONFIG_LVGL=y this is missing from your prj.conf: https://github.com/NXPHoverGames/Doom-MCX/blob/main/prj.conf
If that fixes your issue, I fail to understand what the commit you linked has to do with it.

@PetervdPerk-NXP
Copy link
Collaborator Author

PetervdPerk-NXP commented Mar 24, 2025

Is this somehow reproducible with an intree-sample?

EDIT:// Hmm looking at the resulting autoconf.h tho it seems like CONFIG_LV_Z_POINTER_INPUT is not enabled for me, which might be the culprit. Can you check this on your end with a working build @PetervdPerk-NXP ?

EDIT-2:// @PetervdPerk-NXP try adding CONFIG_LVGL=y this is missing from your prj.conf: https://github.com/NXPHoverGames/Doom-MCX/blob/main/prj.conf If that fixes your issue, I fail to understand what the commit you linked has to do with it.

  1. There's no in-tree sample that only uses input it seems
  2. I'm not using LVGL just zephyr framebuffer and zephyr input, this works without LVGL
  3. I tried CONFIG_LGVL=y, doesn't fix it.
  4. The error/warning you're showing when compiling is a warning treated as error, which compiler are you using? kinda weird though it enforces this now. You could try something like set(CMAKE_COMPILE_WARNING_AS_ERROR OFF)

@Finomnis
Copy link
Contributor

Finomnis commented Mar 24, 2025

@PetervdPerk-NXP would it be possible for you to strip down that project to a minimal reproducible example? Which demonstrates the problem and works if you revert said commit? So we can argue about it more focused?

@PetervdPerk-NXP
Copy link
Collaborator Author

PetervdPerk-NXP commented Mar 24, 2025

@PetervdPerk-NXP would it be possible for you to strip down that project to a minimal reproducible example? Which demonstrates the problem and works of you revert said commit? So we can argue about it more focused?

Kinda hard to do tbh, overall a simple fb with input sample is missing unfortunately.

Btw I've narrowed it down to the line causing it.

SDL_SetTextureBlendMode(*texture, SDL_BLENDMODE_BLEND);

Uncommenting this gives me input again.

EDIT2: @faxe1008 if you could share the cmake output showcasing the GCC version that would help me fixing the compile error you're having.

@Finomnis
Copy link
Contributor

Finomnis commented Mar 24, 2025

OK wtf :D if that is true then something very weird is going on.

Do we cause undefined behavior somewhere?

@Finomnis
Copy link
Contributor

@PetervdPerk-NXP are you aware that the SDL driver is not multi threading capable? If you interact with the SDL driver from multiple threads, that might cause undefined behavior. Just checking, this is an open issue at the moment if I remember right.

@PetervdPerk-NXP
Copy link
Collaborator Author

@PetervdPerk-NXP are you aware that the SDL driver is not multi threading capable? If you interact with the SDL driver from multiple threads, that might cause undefined behavior. Just checking, this is an open issue at the moment if I remember right.

I do have 2 threads but that's working since like zephyr 3.6.0, however I'm running with SDL_DISPLAY_USE_HARDWARE_ACCELERATOR=n which was always fine.

I'm thinking it could be perf related thing, without SDL_SetTextureBlendMode disabled CPU load is ~60% and with SDL_SetTextureBlendMode enabled it's around ~95%.

@Finomnis
Copy link
Contributor

Finomnis commented Mar 24, 2025

So what happens if you SDL_DISPLAY_USE_HARDWARE_ACCELERATOR=y?

@PetervdPerk-NXP
Copy link
Collaborator Author

Then I don't have a screen due to multiple threads, just a gray background frames don't get updated.

Hacking my code to be single threaded I get a screen again and input works. But Cpu load skyrockets to 600% (multi-core) though.

So my hypothesis, SDL_DISPLAY_USE_HARDWARE_ACCELERATOR=n, is just single threaded and the alpha blending introduced #81184 eats so much cpu thus blocking input events.

@Finomnis
Copy link
Contributor

Hmm. That sucks.

I think the proper fix would be to finally tackle #71410.

@Finomnis
Copy link
Contributor

Also, why do you get such a high CPU load? I'm pretty sure SDL uses OpenGL if available, there should not be much CPU involved in the SDL draw calls.

@Finomnis
Copy link
Contributor

I mean we could add a kconfig option that disables transparency, that would be another "fix". But for your case it feels more like a workaround tbh.

@PetervdPerk-NXP
Copy link
Collaborator Author

I'm okay keeping this open for now and linking to #71410 when that gets fixed this one would be fixed as well.

Regarding cpu load, no clue maybe due to wsl2 that does opengl software rendering in the end.

@fabiobaltieri
Copy link
Member

So my hypothesis, SDL_DISPLAY_USE_HARDWARE_ACCELERATOR=n, is just single threaded and the alpha blending introduced #81184 eats so much cpu thus blocking input events.

Is it on LVGL or on the zephyr threads? Not sure this changes anything in native_sim but the input thread has a very low priority by default, maybe try

CONFIG_INPUT_THREAD_PRIORITY_OVERRIDE=y
CONFIG_INPUT_THREAD_PRIORITY=-2

@Finomnis
Copy link
Contributor

Is it on LVGL or on the zephyr threads?

@fabiobaltieri No lvgl involved here, to my understanding.

@fabiobaltieri
Copy link
Member

fabiobaltieri commented Mar 25, 2025

@fabiobaltieri No lvgl involved here, to my understanding.

Right, does SDL have a thread that could starve everything else? Though IIUC on native_sim all Zephyr threads map to pthread threads, doubt it would just starve the input thread to the point it would never run.

Only SDK thread I see is the one in boards/native/common/sdl/sdl_events.c and has a sleep in the loop, so yeah doubt there's any problem there.

@Finomnis
Copy link
Contributor

Finomnis commented Mar 25, 2025

That's why I feel like it's a user thread priority problem. OP states that they have two threads, so I assume one for rendering and one for input. I suspect the rendering thread has a higher priority than the input thread.

@PetervdPerk-NXP
Copy link
Collaborator Author

That's why I feel like it's a user thread priority problem. OP states that they have two threads, so I assume one for rendering and one for input. I suspect the rendering thread has a higher priority than the input thread.

Well this the fun thing I don't have special input thread. Main thread samples input periodically and there's an input callback for touch. No clue how zephyr handles this, but again on native hardware this works fine.

I think it's more related to the sdl implementation itself not forwarding input events.

@nashif nashif added the priority: medium Medium impact/importance bug label Mar 25, 2025
@PetervdPerk-NXP
Copy link
Collaborator Author

So my hypothesis, SDL_DISPLAY_USE_HARDWARE_ACCELERATOR=n, is just single threaded and the alpha blending introduced #81184 eats so much cpu thus blocking input events.

Is it on LVGL or on the zephyr threads? Not sure this changes anything in native_sim but the input thread has a very low priority by default, maybe try

CONFIG_INPUT_THREAD_PRIORITY_OVERRIDE=y
CONFIG_INPUT_THREAD_PRIORITY=-2

Thank you for the suggestion, just tried this but doesn't fix the input.

@Finomnis
Copy link
Contributor

Finomnis commented Mar 26, 2025

Well this the fun thing I don't have special input thread. Main thread samples input periodically and there's an input callback for touch.

So your main thread also renders the image and draws it on screen? You mention you have two threads, what does the other thread do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Input Input Subsystem and Drivers area: native port Host native arch port (native_sim) bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Regression Something, which was working, does not anymore
Projects
None yet
Development

No branches or pull requests

8 participants