You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once an SMT starts to fill up a bit, you start to have a lot of keys on disk due to all the internal branches. One optimization we did in the past was to "compress" the set of empty branches from a non-zero branch to leaf by omitting writing them to disk all together. This helps in terms of the storage space and also query speed, but doesn't allow us to insert a set of leaves more quickly.
Each time we insert a new item, we need to "bubble up" the tree and update all the branch digests from that leaf to the root of the tree. In many cases, when we're inserting a batch of items, we'll end up re-computing the same internal node hash several times, and also redundantly writing the same internal nodes to disk. An ideal algorithm would take all the items to be inserted, then walk up the tree in series, updating each internal node as needed, finally only writing the root tree one time.
We should explore the addition of a set of batched APIs to the current interface to allow us to save on on-disk churn, and also speed up batch insertion operations.
Background
Once an SMT starts to fill up a bit, you start to have a lot of keys on disk due to all the internal branches. One optimization we did in the past was to "compress" the set of empty branches from a non-zero branch to leaf by omitting writing them to disk all together. This helps in terms of the storage space and also query speed, but doesn't allow us to insert a set of leaves more quickly.
Each time we insert a new item, we need to "bubble up" the tree and update all the branch digests from that leaf to the root of the tree. In many cases, when we're inserting a batch of items, we'll end up re-computing the same internal node hash several times, and also redundantly writing the same internal nodes to disk. An ideal algorithm would take all the items to be inserted, then walk up the tree in series, updating each internal node as needed, finally only writing the root tree one time.
We should explore the addition of a set of batched APIs to the current interface to allow us to save on on-disk churn, and also speed up batch insertion operations.
Some relevant reading material follows:
The text was updated successfully, but these errors were encountered: