Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8817] fix backward compatibility issue #12615

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Davis-Zhang-Onehouse
Copy link
Contributor

@Davis-Zhang-Onehouse Davis-Zhang-Onehouse commented Jan 10, 2025

Change Logs

when we specify write table version as 6

CREATE TABLE hudi_14_table_sql_005 (
    event_id INT,
    event_date STRING,
    event_name STRING,
    event_ts STRING,
    event_type STRING
) USING hudi
 OPTIONS(
    type = 'cow', -- or 'mor'
    primaryKey = 'event_id,event_date',
    preCombileField = 'event_ts',
    hoodie.write.table.version = 6
)
PARTITIONED BY (event_type)
LOCATION 's3://<some_bucket>/hudi_14_table_sql_005';

the hoodie.properties said the version is 8.

It is because at SQL create table code path we didn't use hoodie.write.table.version to config hoodie.table.version. Now we have the following rules:

  • If both hoodie.write.table.version and hoodie.table.version are set, hoodie.table.version prevails
  • If only one of them is set explicitly, that one will be used to configure hoodie.table.version.

Impact

hoodie.write.table.version will also contribute to hoodie.table.version.

Risk level (write none, low medium or high below)

none

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Jan 10, 2025
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Jan 10, 2025
- Look up in col stats index based on indexed cols. Non indexed cols are ignored while looking up if rest of the filter expr is eligible to be looked up.
@github-actions github-actions bot added size:M PR with lines of changes in (100, 300] and removed size:L PR with lines of changes in (300, 1000] labels Jan 10, 2025
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@@ -1271,10 +1272,24 @@ public TableBuilder fromProperties(Properties properties) {
setTableName(hoodieConfig.getString(HoodieTableConfig.NAME));
}

if (hoodieConfig.contains(VERSION)) {
if (hoodieConfig.contains(VERSION) && hoodieConfig.contains(WRITE_TABLE_VERSION)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in meta client should not consider write table version. Otherwise, there is no point checking whether hoodie.write.table.version and hoodie.table.version match.

CommonClientUtils#validateTableVersion has the validation. Does Spark SQL writer escape that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, SQL writer does not call this function at all for create table stmt

@Davis-Zhang-Onehouse Davis-Zhang-Onehouse marked this pull request as draft January 12, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M PR with lines of changes in (100, 300]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants