-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Kernel] Add AllowDefaultColumns table feature to kernel #5031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dfeca68
to
289f411
Compare
public static boolean supports(Row transactionState, TableFeature feature) { | ||
return FEATURE_PROP_SUPPORTED.equals( | ||
TransactionStateRow.getConfiguration(transactionState) | ||
.getOrDefault( | ||
TableFeatures.SET_TABLE_FEATURE_SUPPORTED_PREFIX + feature.featureName(), | ||
"unknown")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the validation to ColumnDefaults. Just like the blockIfColumnMappingEnabled
in ColumnMapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a validation method ColumnDefaults.blockIfEnabled
.
It seems there is not yet a TableFeature.supports
common util, I suggest we keep this one. wdyt.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/tablefeatures/TableFeatures.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/SchemaUtils.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
dbe0406
to
837c4ec
Compare
837c4ec
to
d3a155f
Compare
try { | ||
String stripped = stripQuotationMark(value); | ||
if ((type instanceof StringType) || (type instanceof BinaryType)) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have the expression as "a+b(x)", will this return as successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and it will be treated as literal string, not an expression. I update the validation to require String/Binary to always be enclosed in quotes.
"CURRENT_TIMESTAMP()"
is valid and will be treated as a literal.
CURRENT_VALUE()
is invalid.
if (field.getMetadata().contains(DEFAULT_VALUE_METADATA_KEY)) { | ||
validateLiteral( | ||
field.getDataType(), field.getMetadata().getString(DEFAULT_VALUE_METADATA_KEY)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to validate only if the expression has changed right? In the current mode we always evaluate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both always validate and only validate when it changes could satisfy Iceberg requirements, and I am okay with either.
Follow the comment to change to validate only on change.
} | ||
} | ||
|
||
private static String stripQuotationMark(String input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work? input.replaceAll("^[\"']+|[\"']+$", "");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work as it also replace 'xxx"
and "xxx'
(non-matching quotes)
d3a155f
to
e9f7409
Compare
e9f7409
to
7b7b7ce
Compare
/** | ||
* Validate Column Default value changes in the provided metadata. 1. Only the added/changed | ||
* default value is checked. 2. Kernel only supports literal default values. See | ||
* {validateLiteral}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry can you clarify what "checked" means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also clarify the allowed actions?
- Can you remove defaults from existing columns?
- Can you add defaults to existing columns?
- Can you add new columns with a default?
- Can you change the default value for an existing column?
etc
I'd like to better understand what this "validation" should be exactly. I have a feeling it might make sense for this code, instead of being separate, to be part of the schema evolution validation code or other schema validation code added by @amogh-jahagirdar
Also can you clarify what needs to be validated for a new table vs an existing table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wondering about potential dependencies with type widening? cc @emkornfield
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+100 on the new tables. I'm a little confused because I think we would do 0 validation in that case currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the current validation just that the type matches? (For existing tables)
Honestly, why do we need to only check this if the metadata has changed between new and old? It probably isn't too costly to just check the type matches for all default columns and only validate the new schema (no need to compare between old and new)?
I think I still need to better understand if there's any other validation that we should be doing besides that for schema evolution however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covered all answers here.
can you clarify what "checked" means
"Check" means making sure the default value satisfies the item 2/3/4 listed in the PR descriptions.
Can you also clarify the allowed actions?
- Can you remove defaults from existing columns?
Yes
- Can you add defaults to existing columns?
Yes
- Can you add new columns with a default?
Yes
- Can you change the default value for an existing column?
Yes.
can you clarify what needs to be validated for a new table vs an existing table
on the new tables. I'm a little confused because I think we would do 0 validation in that case currently?
The validation to the data types and default values are the same.
For creating table case we NEED to check the data type and the literal values.
I think the recent refactoring on TransactionMetadataFactory
caused my confusion. Did not realize that new metadata and update metadata are stored separately. I will cover the CREATE TABLE case.
Honestly, why do we need to only check this if the metadata has changed between new and old? It probably isn't too costly to just check the type matches for all default columns and only validate the new schema (no need to compare between old and new)?
This is a discussion result with @vkorukanti. See #5031 (comment). The idea is that Kernel only provides a limited support to this table feature. And there can be existing tables with default values that kernel does not support. As far as cx does not change these default values in the kernel, we want to leave them as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment below on type widening.
newDefaults.forEach( | ||
(path, field) -> { | ||
String newDefaultValue = field.getMetadata().getString(DEFAULT_VALUE_METADATA_KEY); | ||
StructField existingField = oldDefaults.getOrDefault(path, defaultField); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry why do we do this? I think we can have a better solution for dealing with if oldDefaults doesn't contain path than having this plugin fake field right? This is super unintuitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We handle cases that
- oldDefaults does not contain path
- oldDefaults contains the path (has the field) but the metadata changes.
Please kindly provide your suggestion for improving the code.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/tablefeatures/TableFeatureSupport.java
Outdated
Show resolved
Hide resolved
public static final TableFeature ALLOW_COLUMN_DEFAULTS_W_FEATURE = | ||
new AllowColumnDefaultsTableFeature(); | ||
|
||
private static class AllowColumnDefaultsTableFeature extends TableFeature.WriterFeature { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkorukanti wdyt should this be auto-enabled by metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked with @vkorukanti . This is enabled by schema (Schema contains default value), and it does not need a config key to enable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- if we want it to be auto-enabled though then it should extend FeatureAutoEnabledByMetadata
and override metadataRequiresFeatureToBeEnabled
so that it can be auto-enabled by the schema. See VariantType
or TimestampNtz
for examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We intentionally not doing this to keep the behavior in sync with the Edge version. This can be updated later though.
*/ | ||
public class ColumnDefaults { | ||
|
||
private static final String DEFAULT_VALUE_METADATA_KEY = "CURRENT_DEFAULT"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @raveeram-db @scottsand-db @emkornfield another scenario regarding our StructField APIs? another example of unstructured metadata stored in the FieldMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:( was there any bandwidth allocated for trying to address it this quarter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least have a decision on the direction we want to go, and do the right thing for this modelling (e.g. metadata only for modelling it directly on FieldSchema).
kernel/kernel-api/src/main/java/io/delta/kernel/internal/tablefeatures/TableFeatures.java
Show resolved
Hide resolved
@@ -169,6 +170,9 @@ static CloseableIterator<FilteredColumnarBatch> transformLogicalData( | |||
boolean isIcebergCompatEnabled = | |||
isIcebergCompatV2Enabled(transactionState) || isIcebergCompatV3Enabled(transactionState); | |||
blockIfColumnMappingEnabled(transactionState); | |||
// We recognize the AllowColumnDefaults feature for Iceberg v3 | |||
// but do not support writing with it yet | |||
ColumnDefaults.blockIfEnabled(transactionState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah +1 to what I said before, this is not tested and I think likely wouldn't fail correctly. This potentially needs to look at the schema. For testing: maybe look at how blockIfColumnMappingEnabled(transactionState)
was tested? It needs to be tested E2E, not just how it's tested in the TableFeaturesSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is adding writerFeatures
to TransactionStateRow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to look at if schema contains default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests are great, but we also need some E2E tests in the kernelDefaults
module for the validation checks (otherwise how can we know they are correctly executed as part of the metadata validation?)
Should cover new tables & existing tables w schema evolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch. Will add that.
@@ -555,6 +557,11 @@ private void validateMetadataChangeAndApplyTypeWidening() { | |||
MaterializedRowTrackingColumn.throwIfColumnNamesConflictWithSchema(getEffectiveMetadata()); | |||
} | |||
|
|||
private void validateColumnDefaultsChanges() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more q: should we check/require that the tf is enabled if we see defaults in the metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check that requires the table feature to be present if the schema contains default.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
for (SchemaIterable.SchemaElement element : new SchemaIterable(schema)) { | ||
StructField field = element.getField(); | ||
if (field.getMetadata().get(DEFAULT_VALUE_METADATA_KEY) != null) { | ||
result.put(element.getNamePath(), field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if column mapping is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use fieldId
as key when ColumnMapping is enabled in both old metadata and new metadata.
} | ||
|
||
private static void validateLiteral(DataType type, String value) { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have literal validation anyplace else in Kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked with @vkorukanti and we don't believe there is something ready for use.
throw new IllegalArgumentException("invalid default value " + value + " for " + type); | ||
} | ||
} else if (type instanceof BooleanType) { | ||
Boolean.parseBoolean(stripped); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do java built-ins provide good error messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to ensure a consistent hopefully actionable error message for all cases there this fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exceptions thrown from here will be captured and tell user they need to update the default value.
/** | ||
* Validate Column Default value changes in the provided metadata. 1. Only the added/changed | ||
* default value is checked. 2. Kernel only supports literal default values. See | ||
* {validateLiteral}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment below on type widening.
} else if (type instanceof BooleanType) { | ||
Boolean.parseBoolean(stripped); | ||
} else if (type instanceof DateType) { | ||
LocalDate.parse(stripped, DateTimeFormatter.ISO_LOCAL_DATE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are writers supposed to update the default value if the type changes? I think BigDecimal might be OK, but it isn't clear to me if DATE->TimestampNTZ is handled correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no support to type widening now. E.g., when a client changes a column type, they are REQUIRED to also change the default values, and we don't support providing a timestamp value to a date field.
We may improve this in the future by allowing more compatible values. For now this table feature is meant to be a minimal working version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as a follow-up we should make sure that type widening fails appropriately if the value isn't changed.
String newDefaultValue = field.getMetadata().getString(DEFAULT_VALUE_METADATA_KEY); | ||
StructField existingField = oldDefaults.getOrDefault(path, defaultField); | ||
if (!Objects.equals( | ||
existingField.getMetadata().getString(DEFAULT_VALUE_METADATA_KEY), newDefaultValue)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we should only validateLiteral if Iceberg V3 is enabled. This check is narrower then the Delta spec which seems to allow for arbitrary SQL expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. As mentioned above, this is a minimal working implementation of default value table feature that meant to be mainly used with Iceberg V3. We support only literal now as:
- We don't expect a lot of non-V3 usage of this table feature.
- Proper validation of SQL expression may requires support from engines, which is missing now.
Due to time limit, we do not plan to implement checks including SQL expression and type widening for now. Thus it is safer to just apply this check to all cases rather than relax it for non V3 case. We don't expect many non-V3 use cases anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I might be confused but it looks like validateColumnDefaultsChanges
is called regardless of whether the table actually has Iceberg compat enabled? If this is the case this is a potentially breaking change. At the very least we should have a help error message in the exception raised that distinguishes between:
- Iceberg compat is enabled (only literal values are allowed for defaults).
- Iceberg compat is not enabled (the default value update was not a literal, but this is just a feature gap in Kernel).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emkornfield Edited my previous comments a bit.
As we are not ready to parse and validate the general SQL expression in non-V3 case. If we apply the checks only to V3 case, the result will be
- v3 enabled -> current implementation (literal only)
- v3 not enabled -> no check on the default value.
In the worst case, a user could use Kernel to generate a Delta table with a default value that no engine recognizes. Are we okay with that?
cc @vkorukanti as I think we have discussed and agreed that we don't want to have Kernel generates unreadable Delta tables.
Regarding your concern of "breaking change", I don't quite get it. The check will only be activated if an user add or update a column with default value. If they don't change the default values on the schema, this check will not be activated, and there will be no impact to them at all. Please let me know if this is the case you have in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was chatting with @harperjiang offline regarding this. To avoid confusion, let's perhaps start with allowing setting this table feature only on IcebergWriterCompatV3 tables folks? That way it's scoped and clear that non-v3 tables have a feature gap that we can address in the future. Does that sound good @emkornfield @vkorukanti ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the worst case, a user could use Kernel to generate a Delta table with a default value that no engine recognizes. Are we okay with that?
I think it is fine to fail. But the reason for failure is different. For Iceberg V3 we fail because Iceberg only supports literals. For non-iceberg V3 we fail because Kernel doesn't support parsing arbitrary expressions. As a user of kernel I think the difference for reason in failure should be made clear if possible.
Regarding your concern of "breaking change", I don't quite get it.
IIUC, before this change a user could put the default value into properties (perhaps without the feature flag), if that value was a valid expression this new code cause the direct put into field properties to fail? I'd guess consumers aren't doing this in general but that was the breaking change I was referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid confusion, let's perhaps start with allowing setting this table feature only on IcebergWriterCompatV3 tables folks?
This also works for me as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Will follow @raveeram-db 's suggestion.
I think this is fine, I responded to the prior comment but I think we should be telling users explicitly why the check failed (i.e. it is V3 and must be a literal, or it is not V3 and we simply haven't implemented support for expressions). |
55278cc
to
8d1d49a
Compare
@emkornfield @raveeram-db thanks for the discussion. |
I think I am happy for it to be more restrictive now, we can always add support later on. Agree with @emkornfield that the error message is important. -- To that note, I strongly feel that this check for it being literal should definitely happen as part of the checks in Along those lines, I commented this somewhere, but can you look into adding this validation to |
I'm still not sure I'm on board with only checking updated default values -- shouldn't we not support writing to a table with any default values that we don't understand? I guess we don't technically support writing data to a table with any default values at all... but I'm not completely convinced of a good reason to only check the updated ones either. |
I agree that in the final stage, we will have two separated validations:
Right now, the expression validation is not ready, and we need to have some kind of validation for Default Value table feature. The solution is to use the literal only validation in Default Value for now. I also agree that once we start working on the "complete" Default Value validation, the literal only check should be moved to a separate V3 only check. But maybe it is not now. In addition, CompatV3 is an internal feature which do not have public RFC yet.
The decision of checking only changed value is following @vkorukanti 's suggestion. I think the main purpose is to allow Kernel to read as many Delta tables as possible. I am fine with either way and will leave the decision to you guys. Please lmk. |
@allisonport-db @raveeram-db |
SGTM. This also only applies to table writes not reads! So should not have any impact on reads. |
So just for clarity, we will require if we see a default column that:
And:
Overall, this SGTM and makes me feel a lot better about safety. We can always loosen restrictions in the future. Btw, I think if you update |
7daf89e
to
7cd3452
Compare
@allisonport-db all comments addressed. Please have a look at your convenience. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! A few comments on how we enforce the icebergCompatV3 requirement. Since this is a temporary requirement in Kernel for now it shouldn't be baked into the table feature framework, and we should try to throw a more informative error message.
Will review tests on next pass. Ty!
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
...in/java/io/delta/kernel/internal/icebergcompat/IcebergCompatMetadataValidatorAndUpdater.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/tablefeatures/TableFeatures.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/SchemaUtils.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/SchemaUtils.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/SchemaUtils.java
Outdated
Show resolved
Hide resolved
.../java/io/delta/kernel/internal/icebergcompat/IcebergCompatV3MetadataValidatorAndUpdater.java
Show resolved
Hide resolved
14fa6b5
to
9de3670
Compare
throw new KernelException( | ||
"This table does not enable table features for setting column defaults"); | ||
} | ||
} else if (!isIcebergCompatV3Enabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this wrong and always throws an exception regardless of whether or not there are default values? Also, concerning why this is not caught by a test?!
The flow of this function really should be:
- Collect default values
- If default values are present:
- If !isDefaultColumnsEnabled => throw exception
- If !isIcebergCompatV3Enabled => throw specific exception
- Validate default values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intention is to always throw when defaultColumns is enabled and icebergCompatV3 is not, regardless of whether any default values are actually present, then can we make that super clear -- the way this is written is very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intention is to always throw when defaultColumns is enabled and icebergCompatV3 is not, regardless of whether any default values are actually present
This is the goal. We want to make sure V3 is enabled when Default value table feature is used in Kernel.
the way this is written is very confusing.
I am not sure which part is confusing. I think the error message clearly indicates that the two features need to be enabled together. I can probably tweak the order of "if else". If you can clarify the confusing part I will also be happy to follow.
Is the following look good to you?
if (isEnabled & !isIcebergCompatV3Enabled) {
throw ...
}
if (!isEnabled) {
if (defaultValues.findAny().isPresent()) {
throw ...
}
} else {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think that is more clear -- and can you add a comment explaining why we do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future readers won't understand why we have this restriction for now
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/columndefaults/ColumnDefaults.java
Outdated
Show resolved
Hide resolved
Stream<StructField> defaultValues = extractFieldsWithDefaultValues(schema); | ||
if (!isEnabled) { | ||
if (defaultValues.findAny().isPresent()) { | ||
throw new KernelException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an InvalidTableException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing this back.
- We have similar methods
DeltaErrors.unsupportedDataType
that uses aKernelException
- The
InvalidTableException
needs table path and version, which are not available inSchemaUtils.validateSchema
. Passing the info into this method needs some work to refactor several APIs inSchemaUtils
. This can be done in a later refactoring, but I wonder if the change worths it.
.../java/io/delta/kernel/internal/icebergcompat/IcebergCompatV3MetadataValidatorAndUpdater.java
Outdated
Show resolved
Hide resolved
throw new UnsupportedOperationException( | ||
"Kernel does not support column defaults for " + type.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What types fall here? Are they unsupported by Kernel, or unsupported for Default Columns in general? I.e. I assume you cannot have a default value for Map type. In that case, the error should be something like InvalidTableException since that is not a valid state for a Delta table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsupported types include Map/List/Variant/Geo etc etc. These limitations are not part of Protocol spec so I will say this is an implementation limitation, rather than an Invalid Table.
In addition, the current API design of SchemaUtils makes it hard to use InvalidTableException
there (table info is not passed in). This can be improved later in a separated PR probably, with a refactoring of the API.
field -> { | ||
try { | ||
validateLiteral(field.getDataType(), getRawDefaultValue(field)); | ||
} catch (IllegalArgumentException | UnsupportedOperationException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like that we catch and rethrow excpetions both here and within validateLiteral. I'm also not really sure on what the expected CUJ for the various cases are here can you maybe explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the design to catch exception within validateLiteral
is to prepare it to be exposed as a public method later. But I am totally fine to not catch exception in there for now.
By "the expected CUJ for the various cases" do you mean when do we use different exceptions? Basically IllegalArgumentException
is thrown for invalid default value on the given column (e.g., non-date literal for a date col), and UnsupportedOperationException
is thrown for columns that should not have default value. If that's not what you are asking please clarify.
@allisonport-db updated. Please see my comments |
7a0eb75
to
4dd26da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, a few more requests + took a look at the tests (they look awesome so far thanks!)
if (!isEnabled) { | ||
if (defaultValues.findAny().isPresent()) { | ||
throw new KernelException( | ||
"This table does not enable table features for setting column defaults"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the error message more clear? Found column defaults in the schema but the table does not support column defaults with the columnDefaults table feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please put in DeltaErrors + a todo to make this an InvalidTableException
} catch (DateTimeParseException | ||
| IllegalArgumentException | ||
| UnsupportedOperationException e) { | ||
throw DeltaErrors.icebergCompatRequiresLiteralDefaultValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comment before was related to what exception we throw. For example, if we find a column default for a data type we don't support defaults for, we should throw that exception, not the required literalValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make validateLiteral
have a very clear defined docs + throws behavior?
@throws UnsupportedOperationException if Kernel doesn't support default values for the data type
@throws IllegalArgumentException if the value is not a literal value
--> In validateLiteral
we should only throw those exceptions (so need to catch the parse exceptions there)
--> here, we can be specific about our behavior, i.e. take IllegalArgumentException and change to icebergCompatRequiresLiteralDefaultValue; probably just let UnsupportedOperationException fail upward
SchemaUtils.validateSchema( | ||
schema, | ||
isColumnMappingModeEnabled(mappingMode), | ||
TableFeatures.isPropertiesManuallySupportingTableFeature( | ||
tableProperties, ALLOW_COLUMN_DEFAULTS_W_FEATURE), | ||
TableConfig.ICEBERG_COMPAT_V3_ENABLED.fromMetadata(tableProperties)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @amogh-jahagirdar merged a PR that changes where we do this validation -- you should be able to just use the protocol for this after rebasing
} | ||
} | ||
|
||
test("block writing to tables with default values") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test case with default value for nested column?
false)) | ||
ColumnDefaults.validateSchema(correctSchema, true, true) | ||
val e = intercept[KernelException] { | ||
ColumnDefaults.validateSchema(correctSchema, true, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test this case as well with a schema with no default values? Since that was the specific behavior we decided on
@@ -61,7 +64,11 @@ class SchemaUtilsSuite extends AnyFunSuite { | |||
.add("b", INTEGER) | |||
.add("dupColName", StringType.STRING) | |||
expectFailure("dupColName") { | |||
validateSchema(schema, false /* isColumnMappingEnabled */ ) | |||
validateSchema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this elsewhere can we add a test helper method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then these can each be args with default values so (a) booleans are named (b) don't need to repeat them for every call
schema, | ||
true, | ||
false, | ||
false) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a simple defaultColumns test case here? basically to verify that ColumnDefaults. validateSchema
is called as part of the SchemaUtils.validateSchema. Can be super simple.
@@ -1078,6 +1162,7 @@ class SchemaUtilsSuite extends AnyFunSuite { | |||
validateUpdatedSchemaAndGetUpdatedSchema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally these would also be a test method, but good to not do this in this PR since it's already unideal
"delta.enableIcebergCompatV3" -> "true", | ||
"delta.columnMapping.mode" -> "id") | ||
|
||
test("allow default value in schema when the table feature is enabled") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you separate the valid vs invalid cases into separate tests?
|
||
import org.scalatest.funsuite.AnyFunSuite | ||
|
||
class ColumnDefaultsSuite extends AnyFunSuite with WriteUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also just test the replace case? Then this test coverage looks good!
Which Delta project/connector is this regarding?
Description
This PR allows kernel to recognize Delta tables with AllowColumnDefaults table feature for Iceberg V3.
NOTE: this PR does not IMPLEMENT column defaults in kernel. It actually blocks writing to Delta tables with default values. We just recognize the table feature so V3 table can be read by kernel. Writing operations will be either in DBR or Iceberg client, not in kernel.
Current behavior:
How was this patch tested?
UT
Does this PR introduce any user-facing changes?
No