Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDL display driver doesn't support API calls from a thread other than main #71410

Open
kartben opened this issue Apr 11, 2024 · 22 comments · May be fixed by #87883
Open

SDL display driver doesn't support API calls from a thread other than main #71410

kartben opened this issue Apr 11, 2024 · 22 comments · May be fixed by #87883
Assignees
Labels
area: Display area: native port Host native arch port (native_sim) bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@kartben
Copy link
Collaborator

kartben commented Apr 11, 2024

Describe the bug

Trying to call display functions (ex display_write) from a thread other than main does not seem to work.

To Reproduce
Steps to reproduce the behavior:

  1. change samples/drivers/display/src/main.c so that the contents of main() is effectively executed in a different thread
  2. build and run on native_sim --> screen is black

Expected behavior

SDL driver should work in a multithreaded environment, just like other display drivers.

Impact

can't run custom GUI code that requires a dedicated thread using native sim

Environment (please complete the following information):

Linux x86_64, 3.6.99 f419ea7

@kartben kartben added bug The issue is a bug, or the PR is fixing a bug area: native port Host native arch port (native_sim) area: Display labels Apr 11, 2024
@aescolar
Copy link
Member

@kartben can you provide a minimal repro case?

@kartben
Copy link
Collaborator Author

kartben commented Apr 12, 2024

@aescolar Tweaking the display driver sample like the below is the minimal repro. After applying the change, sample still works fine on e.g. M5Stack Core 2 (ILI9342C display controller), but SDL/native_sim_64 gives a black screen.

diff --git a/samples/drivers/display/src/main.c b/samples/drivers/display/src/main.c
index a9267b51c7..af14abc3b0 100644
--- a/samples/drivers/display/src/main.c
+++ b/samples/drivers/display/src/main.c
@@ -170,8 +170,24 @@ static inline void fill_buffer_mono10(enum corner corner, uint8_t grey,
 	fill_buffer_mono(corner, grey, 0xFFu, 0x00u, buf, buf_size);
 }
 
-int main(void)
+int gui_entry_point(void *arg1, void *arg2, void *arg3);
+
+K_THREAD_STACK_DEFINE(thread_stack, 4096);
+static struct k_thread thread;
+
+int main(void) {
+	k_thread_create(&thread, thread_stack, K_THREAD_STACK_SIZEOF(thread_stack),
+			(k_thread_entry_t)gui_entry_point, NULL, NULL, NULL,
+			K_PRIO_PREEMPT(5), 0, K_NO_WAIT);
+	return 0;
+}
+
+int gui_entry_point(void *arg1, void *arg2, void *arg3)
 {
+	ARG_UNUSED(arg1);
+	ARG_UNUSED(arg2);
+	ARG_UNUSED(arg3);
+	
 	size_t x;
 	size_t y;
 	size_t rect_w;

SDL documentation is pretty clear on the fact that e.g. SDL_RenderPresent() or SDL_PollEvent() should only be called from the thread that initialized the video system.

@aescolar
Copy link
Member

@aescolar
Copy link
Member

aescolar commented Apr 12, 2024

@kartben I can reproduce it with that example.

SDL documentation is pretty clear on the fact that e.g. SDL_RenderPresent() or SDL_PollEvent() should only be called from the thread that initialized the video system.

sdl_display_init() is run in the same "zephyr boot" thread that will eventually run the app main()
So it may be that you are right and that is the cause.
Also, as with a quick hack like this below, on top of your patch it renders properly (so sdl_display_init_bottom is called from the same thread that does the display updates):

diff --git a/drivers/display/display_sdl.c b/drivers/display/display_sdl.c
index f21288d1b0f..44013c5adfe 100644
--- a/drivers/display/display_sdl.c
+++ b/drivers/display/display_sdl.c
@@ -48,7 +48,10 @@ static inline uint32_t mono_pixel_order(uint32_t order)
        }
 }
 
-static int sdl_display_init(const struct device *dev)
+static int sdl_display_init(const struct device *dev) {
+       return 0;
+}
+int _sdl_display_init(const struct device *dev)
 {
        const struct sdl_display_config *config = dev->config;
        struct sdl_display_data *disp_data = dev->data;

diff --git a/samples/drivers/display/src/main.c b/samples/drivers/display/src/main.c
index a9267b51c74..1142d36ed30 100644
--- a/samples/drivers/display/src/main.c
+++ b/samples/drivers/display/src/main.c
@@ -189,6 +204,9 @@ int main(void)
        fill_buffer fill_buffer_fnc = NULL;
 
        display_dev = DEVICE_DT_GET(DT_CHOSEN(zephyr_display));
+       _sdl_display_init(display_dev);
        if (!device_is_ready(display_dev)) {
                LOG_ERR("Device %s not found. Aborting sample.",
                        display_dev->name);

Anyhow, SDL is a bit far from my expertise.
But I guess that if this is indeed the reason, a fix would be to move all the sdl display handling into the same thread.
I won't have time to do this now though.
So I better remove myself as assignee.

Maybe you or one of the display maintainers has the bandwith?

@aescolar aescolar assigned danieldegrasse and unassigned aescolar Apr 12, 2024
@aescolar aescolar added the priority: low Low impact/importance bug label Apr 12, 2024
jakkra added a commit to ZSWatch/ZSWatch that referenced this issue May 31, 2024
See issue for more details zephyrproject-rtos/zephyr#71410
Workaround is to run lv_task_handler from main thread and disable CONFIG_LV_Z_FLUSH_THREAD.
jakkra added a commit to ZSWatch/ZSWatch that referenced this issue May 31, 2024
See issue for more details zephyrproject-rtos/zephyr#71410
Workaround is to run lv_task_handler from main thread and disable CONFIG_LV_Z_FLUSH_THREAD.
jakkra added a commit to ZSWatch/ZSWatch that referenced this issue May 31, 2024
See issue for more details zephyrproject-rtos/zephyr#71410
Workaround is to run lv_task_handler from main thread and disable CONFIG_LV_Z_FLUSH_THREAD.
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Jun 12, 2024
@kartben kartben removed the Stale label Jun 12, 2024
ldab pushed a commit to ldab/ZSWatch that referenced this issue Jun 12, 2024
See issue for more details zephyrproject-rtos/zephyr#71410
Workaround is to run lv_task_handler from main thread and disable CONFIG_LV_Z_FLUSH_THREAD.
ldab pushed a commit to ldab/ZSWatch that referenced this issue Jun 12, 2024
See issue for more details zephyrproject-rtos/zephyr#71410
Workaround is to run lv_task_handler from main thread and disable CONFIG_LV_Z_FLUSH_THREAD.
@KimBP
Copy link
Contributor

KimBP commented Jul 16, 2024

@kartben I can reproduce it with that example.

SDL documentation is pretty clear on the fact that e.g. SDL_RenderPresent() or SDL_PollEvent() should only be called from the thread that initialized the video system.

sdl_display_init() is run in the same "zephyr boot" thread that will eventually run the app main() So it may be that you are right and that is the cause. Also, as with a quick hack like this below, on top of your patch it renders properly (so sdl_display_init_bottom is called from the same thread that does the display updates):

diff --git a/drivers/display/display_sdl.c b/drivers/display/display_sdl.c
index f21288d1b0f..44013c5adfe 100644
--- a/drivers/display/display_sdl.c
+++ b/drivers/display/display_sdl.c
@@ -48,7 +48,10 @@ static inline uint32_t mono_pixel_order(uint32_t order)
        }
 }
 
-static int sdl_display_init(const struct device *dev)
+static int sdl_display_init(const struct device *dev) {
+       return 0;
+}
+int _sdl_display_init(const struct device *dev)
 {
        const struct sdl_display_config *config = dev->config;
        struct sdl_display_data *disp_data = dev->data;

diff --git a/samples/drivers/display/src/main.c b/samples/drivers/display/src/main.c
index a9267b51c74..1142d36ed30 100644
--- a/samples/drivers/display/src/main.c
+++ b/samples/drivers/display/src/main.c
@@ -189,6 +204,9 @@ int main(void)
        fill_buffer fill_buffer_fnc = NULL;
 
        display_dev = DEVICE_DT_GET(DT_CHOSEN(zephyr_display));
+       _sdl_display_init(display_dev);
        if (!device_is_ready(display_dev)) {
                LOG_ERR("Device %s not found. Aborting sample.",
                        display_dev->name);

Anyhow, SDL is a bit far from my expertise. But I guess that if this is indeed the reason, a fix would be to move all the sdl display handling into the same thread. I won't have time to do this now though. So I better remove myself as assignee.

Maybe you or one of the display maintainers has the bandwith?

This "workaround" wont work if you also add lvgl into your application. lvgl_init() is called too early and will miss a display to work at.
Forcing lv_task_handler() to be called in context of main-thread like suggested by @jakkra I also find very limiting.
Wouldn't a solution be to (optionally) make the sdl-display-driver spawn a thread being responsible for all the SDL critical stuff like initializing 'video system' and calling RenderPresent etc?
I doubt the overhead of a thread in a native_sim application would cause any resource issues

@aescolar
Copy link
Member

@KimBP just to be clear: That (#71410 (comment)) was not a workaround, and I would very much discourage anybody from doing that. That comment was only meant to provide information about the possible cause of the issue.

Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@kartben
Copy link
Collaborator Author

kartben commented Nov 10, 2024

@Finomnis just in case you feel like spending more time on the SDL driver :-)

@Finomnis
Copy link
Contributor

Finomnis commented Nov 10, 2024

@kartben Might take a look at it. Although the only solution that comes to my mind is spawning a dedicated SDL thread.

Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Jan 10, 2025
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@PetervdPerk-NXP
Copy link
Collaborator

PetervdPerk-NXP commented Mar 24, 2025

Related to #87519, setting SDL_DISPLAY_USE_HARDWARE_ACCELERATOR=n gives a screen again calling from other threads. But then sdl input breaks, most likely due to resource starvation.

@Finomnis
Copy link
Contributor

Finomnis commented Mar 24, 2025

Related to #87519, setting SDL_DISPLAY_USE_HARDWARE_ACCELERATOR=n fixes screen calling from other threads. But then sdl input breaks.

I would intercept this and weaken it a little; we did not confirm that it actually fully breaks SDL input. It could be just in your project, where the priority of the render thread starves the input thread.

@Finomnis
Copy link
Contributor

Finomnis commented Mar 24, 2025

Also, be careful with the word "fixes", both are undefined behavior in the current implementation, it just happens to work better.

@PetervdPerk-NXP
Copy link
Collaborator

PetervdPerk-NXP commented Mar 24, 2025

Related to #87519, setting SDL_DISPLAY_USE_HARDWARE_ACCELERATOR=n fixes screen calling from other threads. But then sdl input breaks.

I would intercept this and weaken it a little; we did not confirm that it actually fully breaks SDL input. It could be just in your project, where the priority of the render thread starves the input thread.

Good point could it be that this is a problem in general?
Render thread being more important then input thread?
Your typical mouse cursor on desktop also has a high priority.

EDIT: my apologies for closing and re-opening. Pressed the wrong button.

@Finomnis
Copy link
Contributor

Good point could it be that this is a problem in general? Render thread being more important then input thread? Your typical mouse cursor on desktop also has a high priority.

Those priorities should be configurable. Maybe they should be reversed in your project?

@faxe1008
Copy link
Collaborator

I took some time yesterday and did some testing. An easy way to break rendering is to have an additional flush thread:

west build -b native_sim/native/64 -d build samples/modules/lvgl/demos -- -DCONFIG_LV_Z_FLUSH_THREAD=y

What I found is that rendering breaks the second writes are done from a different thread:

diff --git a/drivers/display/display_sdl.c b/drivers/display/display_sdl.c
index 7396b105052..9f65c4e44c9 100644
--- a/drivers/display/display_sdl.c
+++ b/drivers/display/display_sdl.c
@@ -8,6 +8,7 @@
 #define DT_DRV_COMPAT zephyr_sdl_dc

 #include <zephyr/drivers/display.h>
+#include <zephyr/kernel.h>

 #include <string.h>
 #include <stdlib.h>
@@ -38,6 +39,7 @@ struct sdl_display_data {
        enum display_pixel_format current_pixel_format;
        uint8_t *buf;
        uint8_t *read_buf;
+       k_tid_t sdl_thread_id;
 };

 static inline uint32_t mono_pixel_order(uint32_t order)
@@ -55,6 +57,7 @@ static int sdl_display_init(const struct device *dev)
        struct sdl_display_data *disp_data = dev->data;
        bool use_accelerator = true;
        LOG_DBG("Initializing display driver");
+    disp_data->sdl_thread_id = k_current_get();

        IF_DISABLED(CONFIG_SDL_DISPLAY_USE_HARDWARE_ACCELERATOR, (use_accelerator = false));

@@ -254,6 +257,10 @@ static int sdl_display_write(const struct device *dev, const uint16_t x,
        const struct sdl_display_config *config = dev->config;
        struct sdl_display_data *disp_data = dev->data;

+       if (disp_data->sdl_thread_id != k_current_get()) {
+               LOG_WRN_ONCE("Display write called from wrong thread");
+       }
+
        LOG_DBG("Writing %dx%d (w,h) bitmap @ %dx%d (x,y)", desc->width,
                        desc->height, x, y);

What I find strange is that currently even without the -DCONFIG_LV_Z_FLUSH_THREAD handling of input events also happens from another thread https://github.com/zephyrproject-rtos/zephyr/blob/main/boards/native/common/sdl/sdl_events.c#L13
But this seems to work?

Anyways I feel like all operations need to be happening in one thread:

  • Init of display (partially done in display_sdl_bottom.c and sdl_events.c, but the part within sdl_bottom seems to be the important one)
  • Handling of events (currently in sdl_events.c)
  • Display write operations

The input_sdl_touch binding should be ok since it only adds a callback within the init everything else pretty much happens when the callback is triggered.

Don't know what is the best solution for this - let the display driver have an own thread where init happens and a message queue for pending writes? Any ideas @Finomnis @aescolar

@aescolar
Copy link
Member

Don't know what is the best solution for this - let the display driver have an own thread where init happens and a message queue for pending writes?

Something like that makes sense to me.

@Finomnis
Copy link
Contributor

Finomnis commented Mar 27, 2025

I would probably implement it very similar to how the LVGL flush thread is implemented.

About priorities: does this thread need a high priority or will it automatically inherit the priority of the calling thread if they synchronize via semaphore?

@faxe1008
Copy link
Collaborator

Did a very dirty quick sketch for this:
faxe1008@f1f31bc

I find that this indeed fixes the rendering, would be intersting if this also fixes @PetervdPerk-NXP and @JarmouniA issues.

Keeping the message buffer in memory really will not be needed if we make sure that there is only write message at a time, so thats something I will cleanup before submitting.

About priorities: does this thread need a high priority or will it automatically inherit the priority of the calling thread if they synchronize via semaphore?

I think this will be a matter of priorities.

@Finomnis
Copy link
Contributor

@faxe1008 gotta be careful about recursion, the case where the input handlers get called from this thread which in turn causes a new write message. Might need to check if the calling thread is the SDL thread and then bypass the synchronization primitives.

faxe1008 added a commit to faxe1008/zephyr that referenced this issue Mar 31, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
faxe1008 added a commit to faxe1008/zephyr that referenced this issue Mar 31, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
faxe1008 added a commit to faxe1008/zephyr that referenced this issue Mar 31, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
faxe1008 added a commit to faxe1008/zephyr that referenced this issue Mar 31, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
faxe1008 added a commit to faxe1008/zephyr that referenced this issue Mar 31, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
faxe1008 added a commit to faxe1008/zephyr that referenced this issue Mar 31, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
faxe1008 added a commit to faxe1008/zephyr that referenced this issue Mar 31, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
faxe1008 added a commit to faxe1008/zephyr that referenced this issue Mar 31, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
faxe1008 added a commit to faxe1008/zephyr that referenced this issue Apr 1, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
faxe1008 added a commit to faxe1008/zephyr that referenced this issue Apr 1, 2025
Add a thread to the SDL display driver to ensure that init and renderer
calls are performed from the same thread.

Fixes zephyrproject-rtos#71410.

Signed-off-by: Fabian Blatz <fabianblatz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Display area: native port Host native arch port (native_sim) bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants