fix(jsonrpc): fix deadlock in get_all_accounts() #6726

link2xt · 2025-04-01T11:02:03Z

self.accounts.read().await.get_all() acquires a read lock
and does not release it until the end of for loop.
After that, a writer may get into the queue,
e.g. because of the concurrent add_account call.
In this case let context_option = self.accounts.read().await.get_account(id);
tries to acquire another read lock and deadlocks
because tokio RwLock is write-preferring and will not
give another read lock while there is a writer in the queue.
At the same time, writer never gets a write lock
because the first read lock is not released.

The fix is to get a single read lock
for the whole get_all_accounts() call.

This is described in https://docs.rs/tokio/1.44.1/tokio/sync/struct.RwLock.html#method.read:
"Note that under the priority policy of RwLock, read locks are not
granted until prior write locks, to prevent starvation. Therefore
deadlock may occur if a read lock is held by the current task, a write
lock attempt is made, and then a subsequent read lock attempt is made by
the current task."

link2xt · 2025-04-01T11:02:23Z

This is me trying to fix deltachat/deltachat-desktop#4842 (comment)

link2xt · 2025-04-01T11:33:20Z

I still have no idea why it fixes anything, but now we have a regression test that gets stuck reliably without a fix.

WofWca · 2025-04-01T11:53:54Z

I still have no idea why it fixes anything

As someone who doesn't really know the details of how JSON-RPC works, I have a hand-wavy hypothesis (it could be complete gibberish).

Maybe it has to do with sequential handling of JSON-RPC requests (I'm not sure if it's a thing though). get_all_accounts keeps acquiring the read lock in a loop, so in theory select_account could acquire the write lock while get_all_accounts is in the middle of execution.
But the problem arises when JSON-RPC (hypothetically) won't send the response to select_account until get_all_accounts has finished, but get_all_accounts can't finish because select_account has finished.
This is also supported by the fact that, as I mentioned in deltachat/deltachat-desktop#4842 (comment), the JSON-RPC server will stop responding to all requests (though it will keep sending events).

Thanks for fixing it!

link2xt · 2025-04-01T11:55:16Z

though it will keep sending events

Events are also requests, they are received by calling get_next_event().

Also JSON-RPC requests can run in parallel, there is even a test for this running sleep calls.

link2xt · 2025-04-01T12:15:53Z

deltachat-jsonrpc/src/api.rs

@@ -227,8 +227,9 @@ impl CommandApi {
    /// Get a list of all configured accounts.
    async fn get_all_accounts(&self) -> Result<Vec<Account>> {
        let mut accounts = Vec::new();
-        for id in self.accounts.read().await.get_all() {
-            let context_option = self.accounts.read().await.get_account(id);


I have added some debug logging and it turns out get_all_accounts get stuck trying to get a lock here.

I think the problem is that the first read() gets the lock, then add_account() gets into the queue to get the write lock, then this line tries to get another read lock. But tokio RwLock is write-preferring and will not allow another reader when there is a writer in the queue: https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html
So writer waits forever, and first read lock is also not released by get_all_accounts because it wants a second read lock.

It's not obvious, but I think first read() lock is not released until the end of for.

I think you're right. https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html#method.read docs also mention this:

deadlock may occur if a read lock is held by the current task, a write lock attempt is made, and then a subsequent read lock attempt is made by the current task.

I added this reference to the commit message.

WofWca

Really great job with the debugging!

`self.accounts.read().await.get_all()` acquires a read lock and does not release it until the end of `for` loop. After that, a writer may get into the queue, e.g. because of the concurrent `add_account` call. In this case `let context_option = self.accounts.read().await.get_account(id);` tries to acquire another read lock and deadlocks because tokio RwLock is write-preferring and will not give another read lock while there is a writer in the queue. At the same time, writer never gets a write lock because the first read lock is not released. The fix is to get a single read lock for the whole `get_all_accounts()` call. This is described in <https://docs.rs/tokio/1.44.1/tokio/sync/struct.RwLock.html#method.read>: "Note that under the priority policy of RwLock, read locks are not granted until prior write locks, to prevent starvation. Therefore deadlock may occur if a read lock is held by the current task, a write lock attempt is made, and then a subsequent read lock attempt is made by the current task."

Hocuri

Nice!

iequidoo · 2025-04-02T06:17:39Z

Btw, we can use smth like https://docs.rs/timed-locks/latest/timed_locks/ just for safety, see Motivation paragraph there, it exactly tells about deadlocks. Of course we need errors, not panicking, this adds complexity, but if an alternative is hanging the whole program, it may make sense. Not a Rust invention of course, some C libs implementing their own locking have the same locks, looks usable as for me.

link2xt · 2025-04-02T16:11:57Z

Timeouts are not good for the tests because if CI runs in a VM that may be paused, these timeouts will fire from time to time randomly if the runner was suspended at the wrong moment.

parking_lot has experimental deadlock detection, but I don't know how it works and it is probably experimental for a reason. It is also not async.

There is an async lock https://docs.rs/async-lock/latest/async_lock/struct.RwLock.html, but it seems to be the same as tokio, "write-preferring".

iequidoo · 2025-04-02T18:43:19Z

We can use different timeouts depending on configuration, there's RwLock::new_with_timeout() there and CI will still detect deadlocks. I'd avoid using read-preferring locks, at least we need to check the code to make sure we won't have starving writers. Moreover, read-preferring locks don't solve the deadlocks problem completely, one can still implement an ABBA deadlock using two mutexes. Maybe an RwLock with the following policy may help: a reader may acquire the RwLock if the oldest existing (holding or wating) lock is a reader too. If the oldest lock is a writer, the reader must go wait. But i'm not aware of existing implementations.

WofWca · 2025-04-04T19:39:48Z

It appears that the bug that this MR fixes is also the reason why our E2E CI has been failing from time to time

https://github.com/deltachat/deltachat-desktop/actions/runs/14255114113/job/39956261540

I thought that it's because somehow clicking "paste" does nothing. But if you look at the screenshot, you can see the good old hourglass being displayed on the new account

link2xt requested review from iequidoo and Simon-Laux April 1, 2025 11:02

link2xt removed request for Simon-Laux and iequidoo April 1, 2025 11:25

link2xt marked this pull request as draft April 1, 2025 11:25

link2xt force-pushed the link2xt/get_all_accounts-lock branch from e2c68ab to d4bbe46 Compare April 1, 2025 11:32

link2xt marked this pull request as ready for review April 1, 2025 11:33

link2xt requested review from iequidoo, Hocuri and Simon-Laux April 1, 2025 11:33

link2xt force-pushed the link2xt/get_all_accounts-lock branch 2 times, most recently from 893d851 to 9cac074 Compare April 1, 2025 11:49

WofWca mentioned this pull request Apr 1, 2025

Error opening v1.54.1 and v1.54.2: failed to load accounts config: channel closed. deltachat/deltachat-desktop#4748

Closed

link2xt commented Apr 1, 2025

View reviewed changes

link2xt force-pushed the link2xt/get_all_accounts-lock branch from 9cac074 to 7f6d444 Compare April 1, 2025 12:27

link2xt changed the title ~~fix: simplify locking in JSON-RPC get_all_accounts() call~~ fix(jsonrpc): fix deadlock in get_all_accounts() Apr 1, 2025

link2xt force-pushed the link2xt/get_all_accounts-lock branch from 7f6d444 to 8098444 Compare April 1, 2025 12:29

WofWca approved these changes Apr 1, 2025

View reviewed changes

link2xt force-pushed the link2xt/get_all_accounts-lock branch from 8098444 to 7056386 Compare April 1, 2025 12:31

link2xt merged commit 7056386 into main Apr 1, 2025
29 checks passed

link2xt deleted the link2xt/get_all_accounts-lock branch April 1, 2025 12:44

link2xt mentioned this pull request Apr 1, 2025

Race condition when clicking the "Plus" icon and then pressing Escape in the account creation dialog may leave stuck processes and temporarily brick profile deltachat/deltachat-desktop#4844

Closed

Hocuri approved these changes Apr 1, 2025

View reviewed changes

WofWca mentioned this pull request Apr 1, 2025

DeltaChat Desktop no longer syncs on one of my devices deltachat/deltachat-desktop#4842

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(jsonrpc): fix deadlock in get_all_accounts() #6726

fix(jsonrpc): fix deadlock in get_all_accounts() #6726

link2xt commented Apr 1, 2025 •

edited

Loading

link2xt commented Apr 1, 2025

link2xt commented Apr 1, 2025

WofWca commented Apr 1, 2025

link2xt commented Apr 1, 2025 •

edited

Loading

link2xt Apr 1, 2025 •

edited

Loading

WofWca Apr 1, 2025

link2xt Apr 1, 2025

WofWca left a comment

Hocuri left a comment

iequidoo commented Apr 2, 2025

link2xt commented Apr 2, 2025

iequidoo commented Apr 2, 2025

WofWca commented Apr 4, 2025

fix(jsonrpc): fix deadlock in get_all_accounts() #6726

fix(jsonrpc): fix deadlock in get_all_accounts() #6726

Conversation

link2xt commented Apr 1, 2025 • edited Loading

link2xt commented Apr 1, 2025

link2xt commented Apr 1, 2025

WofWca commented Apr 1, 2025

link2xt commented Apr 1, 2025 • edited Loading

link2xt Apr 1, 2025 • edited Loading

Choose a reason for hiding this comment

WofWca Apr 1, 2025

Choose a reason for hiding this comment

link2xt Apr 1, 2025

Choose a reason for hiding this comment

WofWca left a comment

Choose a reason for hiding this comment

Hocuri left a comment

Choose a reason for hiding this comment

iequidoo commented Apr 2, 2025

link2xt commented Apr 2, 2025

iequidoo commented Apr 2, 2025

WofWca commented Apr 4, 2025

link2xt commented Apr 1, 2025 •

edited

Loading

link2xt commented Apr 1, 2025 •

edited

Loading

link2xt Apr 1, 2025 •

edited

Loading