Iceberg with data migrations #24780

bashtanov · 2025-01-11T02:32:33Z

Add a test for iceberg to read from table whose topic was deleted
Fix minor data migration test issues
Add a test to run iceberg translation for topics unmounted and then, optionally, mounted
For recovered and mounted topics, make Redpanda preserve most topic properties including iceberg ones
When unmounting make sure all messages are translated for iceberg

Backports Required

Release Notes

Make Iceberg and topic mount/unmount work well together

Check that with redpanda.iceberg.delete=false old table data remains available even before we recreate the topic.

And switch back to normal admin after disruptions are over.

add log lines, fix typos

if we unmount the topic before this table may lack metadata

Introduce "offline mode" that cuts all ties to the topic in Redpanda cluster. It carries on querying the query engine and verifying results using info cached before going into offline mode.

for to make functionality is tested while topic is being actively used

Make it possible to configure the number of messages produced by stream

Add scenarios: 1) On unmount all messages that made their way to the topic eventually become available via query engine 2) Upon remount and further produce both old and new messages are in the topic and in the table

to prevent archiver shutdown while waiting

This is mostly to preserve iceberg properties, but also to make sure any newly introduced topic properties are preserved by default.

Allows to use it for subscriptions where feedback from a called function is necessary, such as a future or an error code. All functions are supposed to return the same type.

Make offset_monitor more universal so that it can be used for different data types.

Also create and subscribe one of these actions: flush data to cloud.

Wait for the offset to be translated when asked by partition to "flush".

When blocking writes collect the offset of the blocking message. Then use it to dispatch all-components flush through partition (leading to cloud storage flush that ignores the offset parameter and datalake translator that waits for the correspondent kafka offset)

bashtanov · 2025-01-11T02:33:23Z

/dt

bashtanov added 23 commits January 11, 2025 01:05

tests/datalake/e2e: table can be queried when topic is deleted

d9572f7

Check that with redpanda.iceberg.delete=false old table data remains available even before we recreate the topic.

tests/migrations: switch to flaky admin before disruption

162c211

And switch back to normal admin after disruptions are over.

tests/migrations: correct test_listing_inexistent_migration

b0b5a9e

tests/dl/verifier: improve logging

1ecdf98

add log lines, fix typos

tests/dl/verifier: add facility to wait until first message via iceberg

48710b0

if we unmount the topic before this table may lack metadata

tests/dl/verifier: introduce mode for no communication with the topic

8d69c15

Introduce "offline mode" that cuts all ties to the topic in Redpanda cluster. It carries on querying the query engine and verifying results using info cached before going into offline mode.

tests/migrations: separate migration utility functions into a mixin

2efe5cb

tests/services/connect/stop: option to make sure stream NOT finished

e9a8596

for to make functionality is tested while topic is being actively used

tests/dl/simple_connect_test: refactor in prep to add un/mount tests

dc2287c

tests/dl/simple_connect_test/rpconnect: configure no of messages

aeefed1

Make it possible to configure the number of messages produced by stream

tests/dl/simple_connect_test: test with data migrations

39181f3

Add scenarios: 1) On unmount all messages that made their way to the topic eventually become available via query engine 2) Upon remount and further produce both old and new messages are in the topic and in the table

c/archival/ntp_archiver_service: hold gate while waiting for flush

8b8e537

to prevent archiver shutdown while waiting

c/topic_recovery_service: copy all topic properties, override specific

1518533

This is mostly to preserve iceberg properties, but also to make sure any newly introduced topic properties are preserved by default.

c/migrations/backend: copy all topic properties, override specific ones

da32558

This is mostly to preserve iceberg properties, but also to make sure any newly introduced topic properties are preserved by default.

utils/notification_list: collect return values

40fa5e5

Allows to use it for subscriptions where feedback from a called function is necessary, such as a future or an error code. All functions are supposed to return the same type.

r/offset_monitor: make templated so it can be used for kafka offsets

4c16169

Make offset_monitor more universal so that it can be used for different data types.

dl/translation/stm: implement waiting for translation of specific offset

e95ae2b

c/partition: to dispatch async flush actions to components

020f3cd

Also create and subscribe one of these actions: flush data to cloud.

dl/translation/partition_translator: subscribe as a partition flusher

86bf38a

Wait for the offset to be translated when asked by partition to "flush".

model/fundamental: fix typo in comment

3e68c46

c/topic_frontend: fix formatting

2a0f7c5

c/partition: add comment

2241484

github-actions bot added area/build area/redpanda labels Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iceberg with data migrations #24780

Iceberg with data migrations #24780

bashtanov commented Jan 11, 2025 •

edited

Loading

bashtanov commented Jan 11, 2025

Iceberg with data migrations #24780

Are you sure you want to change the base?

Iceberg with data migrations #24780

Conversation

bashtanov commented Jan 11, 2025 • edited Loading

Backports Required

Release Notes

bashtanov commented Jan 11, 2025

bashtanov commented Jan 11, 2025 •

edited

Loading