Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg with data migrations #24780

Draft
wants to merge 23 commits into
base: dev
Choose a base branch
from

Conversation

bashtanov
Copy link
Contributor

@bashtanov bashtanov commented Jan 11, 2025

  • Add a test for iceberg to read from table whose topic was deleted
  • Fix minor data migration test issues
  • Add a test to run iceberg translation for topics unmounted and then, optionally, mounted
  • For recovered and mounted topics, make Redpanda preserve most topic properties including iceberg ones
  • When unmounting make sure all messages are translated for iceberg

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • Make Iceberg and topic mount/unmount work well together

Check that with redpanda.iceberg.delete=false old table data remains
available even before we recreate the topic.
And switch back to normal admin after disruptions are over.
add log lines, fix typos
if we unmount the topic before this table may lack metadata
Introduce "offline mode" that cuts all ties to the topic in Redpanda
cluster. It carries on querying the query engine and verifying results
using info cached before going into offline mode.
for to make functionality is tested while topic is being actively used
Make it possible to configure the number of messages produced by stream
Add scenarios:
1) On unmount all messages that made their way to the topic eventually
become available via query engine
2) Upon remount and further produce both old and new messages are in the
topic and in the table
This is mostly to preserve iceberg properties, but also to make sure any
newly introduced topic properties are preserved by default.
This is mostly to preserve iceberg properties, but also to make sure any
newly introduced topic properties are preserved by default.
Allows to use it for subscriptions where feedback from a called function
is necessary, such as a future or an error code.
All functions are supposed to return the same type.
Make offset_monitor more universal so that it can be used for different
data types.
Also create and subscribe one of these actions: flush data to cloud.
Wait for the offset to be translated when asked by partition to "flush".
When blocking writes collect the offset of the blocking message.
Then use it to dispatch all-components flush through partition
(leading to cloud storage flush that ignores the offset parameter and
datalake translator that waits for the correspondent kafka offset)
@bashtanov
Copy link
Contributor Author

/dt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant