You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found an issue when expiring old snapshots from a table with a lot of snapshots (+10k). The issue happens when the expireSnapshots action triggers a request with a list of UpdateTableRequest that will clean up most of the snapshots (~99%). The Rest Catalog server will receive a single request (UpdateTableRequest), with the list of snapshots to be removed from the metadata. However, the metadata updates on each snapshot:
This rewriteSnapshotInternal iterates over all snapshots of the table to remove the provided list of snapshots (as I mentioned, it only passes 1 snapshot). This is not efficient, when we need to remove a huge amount of snapshots, we need to iterate over the entire list of snapshots (N elements), N times - O(Nˆ2).
We notice this issue when we recently enable some tables that are written by streaming jobs (they are often written and generate a lot of snapshots). With +10 tables, having +10k snapshots each, some of them +100k, cause our Rest Catalog server to hit 100% CPU usage constantly:
Proposal
On the CatalogHandlers, we could group by the MetadataUpdate, and apply them in bulk:
I remember seeing this in the implementation and thinking the same thing, but it never surfaced as an issue for us. Probably because with continuous maintenance N was small so even if it was N^2 it didn't really end up making too much of a difference.
I think it makes total sense though to fix this (e.g. it's totally possible that snapshots accumulated and for whatever reason expiration couldn't run for a while, and I agree with the proposed fix we should just pass in the whole set of snapshots to remove. The server side commit path would be naturally more reliable as a result.
Proposed Change
Problem
We found an issue when expiring old snapshots from a table with a lot of snapshots (+10k). The issue happens when the
expireSnapshots
action triggers a request with a list ofUpdateTableRequest
that will clean up most of the snapshots (~99%). The Rest Catalog server will receive a single request (UpdateTableRequest), with the list of snapshots to be removed from the metadata. However, the metadata updates on each snapshot:iceberg/core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java
Line 435 in 03ff41c
The
applyTo
method will basically just update the Table for 1 snapshot, even though it could receive the entire list of snapshots:iceberg/core/src/main/java/org/apache/iceberg/MetadataUpdate.java
Line 344 in 03ff41c
The
removeSnapshots
method delegates the process to therewriteSnapshotInternal
:iceberg/core/src/main/java/org/apache/iceberg/TableMetadata.java
Line 1424 in 03ff41c
This
rewriteSnapshotInternal
iterates over all snapshots of the table to remove the provided list of snapshots (as I mentioned, it only passes 1 snapshot). This is not efficient, when we need to remove a huge amount of snapshots, we need to iterate over the entire list of snapshots (N elements), N times - O(Nˆ2).We notice this issue when we recently enable some tables that are written by streaming jobs (they are often written and generate a lot of snapshots). With +10 tables, having +10k snapshots each, some of them +100k, cause our Rest Catalog server to hit 100% CPU usage constantly:
Proposal
On the
CatalogHandlers
, we could group by theMetadataUpdate
, and apply them in bulk:iceberg/core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java
Line 435 in 03ff41c
Proposal document
No response
Specifications
The text was updated successfully, but these errors were encountered: