Skip to content

mssmt: add support for batch insertion and retrieval operations #451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Roasbeef opened this issue Aug 16, 2023 · 0 comments · May be fixed by #1347
Open

mssmt: add support for batch insertion and retrieval operations #451

Roasbeef opened this issue Aug 16, 2023 · 0 comments · May be fixed by #1347

Comments

@Roasbeef
Copy link
Member

Background

Once an SMT starts to fill up a bit, you start to have a lot of keys on disk due to all the internal branches. One optimization we did in the past was to "compress" the set of empty branches from a non-zero branch to leaf by omitting writing them to disk all together. This helps in terms of the storage space and also query speed, but doesn't allow us to insert a set of leaves more quickly.

Each time we insert a new item, we need to "bubble up" the tree and update all the branch digests from that leaf to the root of the tree. In many cases, when we're inserting a batch of items, we'll end up re-computing the same internal node hash several times, and also redundantly writing the same internal nodes to disk. An ideal algorithm would take all the items to be inserted, then walk up the tree in series, updating each internal node as needed, finally only writing the root tree one time.

We should explore the addition of a set of batched APIs to the current interface to allow us to save on on-disk churn, and also speed up batch insertion operations.

Some relevant reading material follows:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🔖 Ready
2 participants