Skip to content

sqlite: better-sqlite3 shouldn't be the default driver (as is), nor be used if more than one sqlite instance or other DBs are also used #1385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DaemonSnake opened this issue Mar 13, 2025 · 5 comments
Labels
breaking change Includes breaking changes built-in dialect Related to a built-in dialect sqlite Related to sqlite

Comments

@DaemonSnake
Copy link

Hi,

As you know better-sqlite3 is the default sqlite drive in Kysely and is synchronous.

The issue with this is that it blocks the full event look during each query.
This make kysely async interface of it quite unorthodox as most Db clients async interfaces are expected to offload query execution outside the event loop.
Also better-sqlite3 is an opinionated client,
as can testify the When is this library not appropriate? ( https://www.npmjs.com/package/better-sqlite3 ).
It isn't meant to be used with many sqlite DBs and combined with async code.

This makes it a poor fit for a default driver of kysely currently.

The consequence here is that it can degrades overall performance of the application and unrelated DBs.
In particular when the project:

  • uses multiple sqlite
  • contains some slow / big queries (for instance filling in temporary sqlite DB for local processing)
  • have many small queries
    etc.

The impact when this occurs can be catastrophic and very hard to find/debug.
For instance in our case it would:

  • block unrelated pending postgresql transactions (commits getting delayed)
    This would then cause oldest_xoid_transaction_age to increase,
    degrading postgres performance and create very weird bugs in completely unrelated projects.
  • same thing for queries on unrelated Sqlite DBs that would become much slower or look stuck
  • prevent proper application sigterm handling / cleanup (causing incomplete / bad states)
  • second long event loop delay almost constantly depending on the number of sqlite Dbs using better-sqlite3
    just to mention the main ones.

This also goes against the trend of creating many small isolated sqlite DB (for instance 1 per user).
Reason being that calls to 3 unrelated DBs are done is series currently instead of in parallel.

We found the issue a while back and replace the default drive in one of place where we create temporary sqlite Dbs
by using a custom dialect using the async lib sqlite3 ( https://www.npmjs.com/package/sqlite3 ) with a simple async mutex to prevent concurrent queries.

We thought this would be enough but forgot to migrate another cases where, we thought, it wouldn't be necessary.
After adding a query_timeout to postgresql's pool config of 1min, we discovered it was not so.

We made sure it was the source of the issue
by creating an sqlite extension that adds a function that sleep for 1 min.
Added an interval that logs and ran a query that uses the above extension.
We confirmed this way that better-sqlite queries block the event loop.

We replaced the default driver with our custom one and the before/after speaks for itself:
Image
Image

We just found the kysely-sqlite-tools repo that contains a
https://github.com/subframe7536/kysely-sqlite-tools/tree/master
we are likely to replace our custom driver by this one as it address the issue.

Thanks for your attention

@igalklebanov igalklebanov added sqlite Related to sqlite built-in dialect Related to a built-in dialect breaking change Includes breaking changes labels Mar 14, 2025
@koskimas
Copy link
Member

koskimas commented Mar 15, 2025

You make a good point. We could expand the sqlite dialect to support both better-sqlite3 and some async alternative. And document the difference clearly.

@BurningEnlightenment
Copy link

This makes it a poor fit for a default driver of kysely currently.
[…]
In particular when the project:
[…]

  • have many small queries

Citation needed. better-sqlite3 will usually perform a lot better in the case of one (worker_ )thread per database context due to the overhead caused by scheduling lots of small tasks on the thread pool which in turn will all fight over the sqlite3 db context mutex.

Due to said db context mutex you can also starve the nodejs thread pool by issuing a long running query and many small queries afterwards on the same db context.

See also

This also goes against the trend of creating many small isolated sqlite DB (for instance 1 per user).

I am unaware of any statistic which could be used to prove that this is indeed the or even a trend.

@DaemonSnake
Copy link
Author

Citation needed. better-sqlite3 will usually perform a lot better ...

The issue here is not about better-sqlite3 query performance.
It's regarding better-sqlite3's negative impact on an application overall performance / availability / reliability.
It trades-off availability and concurrency for increased processing speed on at the expense of other tasks.
Which is the opposite of the tenants of async code, where increased availability / concurrency is reached at the cost of decreased scheduling/memory efficiency.
Which is why it should be opted-in knowingly.

in the case of one (worker_ )thread per database context due to the overhead caused by scheduling lots of small tasks on the thread pool which in turn will all fight over the sqlite3 db context mutex.
Due to said db context mutex you can also starve the nodejs thread pool by issuing a long running query and many small queries afterwards on the same db context.

The approach described is not what we are doing.
We don't argue for it to be a default approach either.
But the starvation issue is also simple to fix.

For reference, the naive approach we used to fixed our issues is the following:
We use https://www.npmjs.com/package/sqlite3 as mentioned previously.
When creating a new DB, It starts a native C++ thread (so outside nodejs' thread pool) running sqlite3.
Our driver has a single connection protected by a basic node-js async mutext of the form:

async aquireLock() {
  while (this.lock) {
   await this.lock;
  }
  this.lock = new Promise(resolve => { this.resolve = resolve; });
}

release() {
   if (this.resolve) {
    const resolve = this.resolve;
    this.resolve = null;
    this.lock = null;
    resolve();
  }
}

This way the sqlite thread only handles 1 query at the time and pending queries at kept as promises.
To limit scheduling hell we limit the max number of sqlite3 instances/native threads to 1-2 per cores per server.

Thanks to this:

  • single sqlite DB doesn't have to deal with parallelism (keeping sqlite's single threaded efficiency)
  • different sqlite DBs aren't competing for the same thread (isolation, better ressource exploitation)
  • no locking of event loop (availability / concurrency)

We also didn't notice any performance penalty in CPU/RAM usage (last CPU spikes in fact)
and moved from an average event loop delay of ~1s to ~80ms

For the approach to be generalizable it would require a way to limit the total number of threads.
Solutions could include:

  • single global native/worker thread managing/starting multiple sqlite3 instances with a single connection via async mutext (best if we just want to prevent event loop blocking)
  • thread/worker pool with sharding,
  • etc.

This also goes against the trend of creating many small isolated sqlite DB (for instance 1 per user).

I am unaware of any statistic which could be used to prove that this is indeed the or even a trend.

the in that context was meant as a and not THE.
As for some meaningful examples, BlueSky uses 1 sqlite file per user:
bluesky-social/atproto#1705

FLY.io article on sqlite for single tenant:
https://fly.io/javascript-journal/single-tenant-sqlite-in-tigris/

blog post on the subject by Turso:
https://turso.tech/blog/give-each-of-your-users-their-own-sqlite-database-b74445f4

referencing this DHH post about using sqlite for single-tenant:
https://world.hey.com/dhh/multi-tenancy-is-what-s-hard-about-scaling-web-services-dd1e0e81

on the reach side of things, reaction from ThePrimeagen to said article:
https://www.youtube.com/watch?v=AtZiAU9-A5A

@igalklebanov
Copy link
Member

Hey 👋

We also use Turso for db-per-tenant in my team:
https://turso.tech/blog/why-prisma-chose-turso-to-power-prisma-optimize

@kysely-org kysely-org deleted a comment Apr 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Includes breaking changes built-in dialect Related to a built-in dialect sqlite Related to sqlite
Projects
None yet
Development

No branches or pull requests

4 participants