Skip to content

Conversation

harperjiang
Copy link
Contributor

@harperjiang harperjiang commented Aug 5, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR allows kernel to recognize Delta tables with AllowColumnDefaults table feature for Iceberg V3.
NOTE: this PR does not IMPLEMENT column defaults in kernel. It actually blocks writing to Delta tables with default values. We just recognize the table feature so V3 table can be read by kernel. Writing operations will be either in DBR or Iceberg client, not in kernel.

Current behavior:

  1. When users change metadata via Kernel, we perform checks on the columns having added/changed default values. Columns with unchanged / removed default values will not trigger check.
  2. In the check, only primitive fields are allowed to have default values. VARIANT and nested types are not allowed.
  3. Default value must be a valid literal, see the method {ColumnDefaults.validateLiteral}.
  4. Writing actual data rows to a table with this table feature will be blocked.

How was this patch tested?

UT

Does this PR introduce any user-facing changes?

No

@harperjiang harperjiang force-pushed the columndefault branch 9 times, most recently from dfeca68 to 289f411 Compare August 6, 2025 07:08
Comment on lines 25 to 31
public static boolean supports(Row transactionState, TableFeature feature) {
return FEATURE_PROP_SUPPORTED.equals(
TransactionStateRow.getConfiguration(transactionState)
.getOrDefault(
TableFeatures.SET_TABLE_FEATURE_SUPPORTED_PREFIX + feature.featureName(),
"unknown"));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the validation to ColumnDefaults. Just like the blockIfColumnMappingEnabled in ColumnMapping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a validation method ColumnDefaults.blockIfEnabled.

It seems there is not yet a TableFeature.supports common util, I suggest we keep this one. wdyt.

@harperjiang harperjiang force-pushed the columndefault branch 4 times, most recently from dbe0406 to 837c4ec Compare August 7, 2025 01:59
@harperjiang harperjiang requested a review from vkorukanti August 7, 2025 02:00
try {
String stripped = stripQuotationMark(value);
if ((type instanceof StringType) || (type instanceof BinaryType)) {
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have the expression as "a+b(x)", will this return as successful?

Copy link
Contributor Author

@harperjiang harperjiang Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and it will be treated as literal string, not an expression. I update the validation to require String/Binary to always be enclosed in quotes.

"CURRENT_TIMESTAMP()" is valid and will be treated as a literal.

CURRENT_VALUE() is invalid.

Comment on lines 56 to 58
if (field.getMetadata().contains(DEFAULT_VALUE_METADATA_KEY)) {
validateLiteral(
field.getDataType(), field.getMetadata().getString(DEFAULT_VALUE_METADATA_KEY));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to validate only if the expression has changed right? In the current mode we always evaluate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both always validate and only validate when it changes could satisfy Iceberg requirements, and I am okay with either.

Follow the comment to change to validate only on change.

}
}

private static String stripQuotationMark(String input) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work? input.replaceAll("^[\"']+|[\"']+$", "");

Copy link
Contributor Author

@harperjiang harperjiang Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work as it also replace 'xxx" and "xxx' (non-matching quotes)

Comment on lines 52 to 55
/**
* Validate Column Default value changes in the provided metadata. 1. Only the added/changed
* default value is checked. 2. Kernel only supports literal default values. See
* {validateLiteral}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry can you clarify what "checked" means?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also clarify the allowed actions?

  • Can you remove defaults from existing columns?
  • Can you add defaults to existing columns?
  • Can you add new columns with a default?
  • Can you change the default value for an existing column?
    etc

I'd like to better understand what this "validation" should be exactly. I have a feeling it might make sense for this code, instead of being separate, to be part of the schema evolution validation code or other schema validation code added by @amogh-jahagirdar

Also can you clarify what needs to be validated for a new table vs an existing table?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering about potential dependencies with type widening? cc @emkornfield

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100 on the new tables. I'm a little confused because I think we would do 0 validation in that case currently?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the current validation just that the type matches? (For existing tables)

Honestly, why do we need to only check this if the metadata has changed between new and old? It probably isn't too costly to just check the type matches for all default columns and only validate the new schema (no need to compare between old and new)?

I think I still need to better understand if there's any other validation that we should be doing besides that for schema evolution however.

Copy link
Contributor Author

@harperjiang harperjiang Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered all answers here.

can you clarify what "checked" means
"Check" means making sure the default value satisfies the item 2/3/4 listed in the PR descriptions.

Can you also clarify the allowed actions?

  • Can you remove defaults from existing columns?

Yes

  • Can you add defaults to existing columns?

Yes

  • Can you add new columns with a default?

Yes

  • Can you change the default value for an existing column?

Yes.

can you clarify what needs to be validated for a new table vs an existing table
on the new tables. I'm a little confused because I think we would do 0 validation in that case currently?

The validation to the data types and default values are the same.

For creating table case we NEED to check the data type and the literal values.
I think the recent refactoring on TransactionMetadataFactory caused my confusion. Did not realize that new metadata and update metadata are stored separately. I will cover the CREATE TABLE case.

Honestly, why do we need to only check this if the metadata has changed between new and old? It probably isn't too costly to just check the type matches for all default columns and only validate the new schema (no need to compare between old and new)?

This is a discussion result with @vkorukanti. See #5031 (comment). The idea is that Kernel only provides a limited support to this table feature. And there can be existing tables with default values that kernel does not support. As far as cx does not change these default values in the kernel, we want to leave them as is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment below on type widening.

newDefaults.forEach(
(path, field) -> {
String newDefaultValue = field.getMetadata().getString(DEFAULT_VALUE_METADATA_KEY);
StructField existingField = oldDefaults.getOrDefault(path, defaultField);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry why do we do this? I think we can have a better solution for dealing with if oldDefaults doesn't contain path than having this plugin fake field right? This is super unintuitive

Copy link
Contributor Author

@harperjiang harperjiang Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We handle cases that

  1. oldDefaults does not contain path
  2. oldDefaults contains the path (has the field) but the metadata changes.

Please kindly provide your suggestion for improving the code.

public static final TableFeature ALLOW_COLUMN_DEFAULTS_W_FEATURE =
new AllowColumnDefaultsTableFeature();

private static class AllowColumnDefaultsTableFeature extends TableFeature.WriterFeature {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkorukanti wdyt should this be auto-enabled by metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with @vkorukanti . This is enabled by schema (Schema contains default value), and it does not need a config key to enable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- if we want it to be auto-enabled though then it should extend FeatureAutoEnabledByMetadata and override metadataRequiresFeatureToBeEnabled so that it can be auto-enabled by the schema. See VariantType or TimestampNtz for examples

Copy link
Contributor Author

@harperjiang harperjiang Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We intentionally not doing this to keep the behavior in sync with the Edge version. This can be updated later though.

*/
public class ColumnDefaults {

private static final String DEFAULT_VALUE_METADATA_KEY = "CURRENT_DEFAULT";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @raveeram-db @scottsand-db @emkornfield another scenario regarding our StructField APIs? another example of unstructured metadata stored in the FieldMetadata

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( was there any bandwidth allocated for trying to address it this quarter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least have a decision on the direction we want to go, and do the right thing for this modelling (e.g. metadata only for modelling it directly on FieldSchema).

@@ -169,6 +170,9 @@ static CloseableIterator<FilteredColumnarBatch> transformLogicalData(
boolean isIcebergCompatEnabled =
isIcebergCompatV2Enabled(transactionState) || isIcebergCompatV3Enabled(transactionState);
blockIfColumnMappingEnabled(transactionState);
// We recognize the AllowColumnDefaults feature for Iceberg v3
// but do not support writing with it yet
ColumnDefaults.blockIfEnabled(transactionState);
Copy link
Collaborator

@allisonport-db allisonport-db Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah +1 to what I said before, this is not tested and I think likely wouldn't fail correctly. This potentially needs to look at the schema. For testing: maybe look at how blockIfColumnMappingEnabled(transactionState) was tested? It needs to be tested E2E, not just how it's tested in the TableFeaturesSuite

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is adding writerFeatures to TransactionStateRow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to look at if schema contains default value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests are great, but we also need some E2E tests in the kernelDefaults module for the validation checks (otherwise how can we know they are correctly executed as part of the metadata validation?)

Should cover new tables & existing tables w schema evolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Will add that.

@@ -555,6 +557,11 @@ private void validateMetadataChangeAndApplyTypeWidening() {
MaterializedRowTrackingColumn.throwIfColumnNamesConflictWithSchema(getEffectiveMetadata());
}

private void validateColumnDefaultsChanges() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more q: should we check/require that the tf is enabled if we see defaults in the metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check that requires the table feature to be present if the schema contains default.

for (SchemaIterable.SchemaElement element : new SchemaIterable(schema)) {
StructField field = element.getField();
if (field.getMetadata().get(DEFAULT_VALUE_METADATA_KEY) != null) {
result.put(element.getNamePath(), field);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if column mapping is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fieldId as key when ColumnMapping is enabled in both old metadata and new metadata.

}

private static void validateLiteral(DataType type, String value) {
try {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have literal validation anyplace else in Kernel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked with @vkorukanti and we don't believe there is something ready for use.

throw new IllegalArgumentException("invalid default value " + value + " for " + type);
}
} else if (type instanceof BooleanType) {
Boolean.parseBoolean(stripped);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do java built-ins provide good error messages?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to ensure a consistent hopefully actionable error message for all cases there this fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceptions thrown from here will be captured and tell user they need to update the default value.

Comment on lines 52 to 55
/**
* Validate Column Default value changes in the provided metadata. 1. Only the added/changed
* default value is checked. 2. Kernel only supports literal default values. See
* {validateLiteral}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment below on type widening.

} else if (type instanceof BooleanType) {
Boolean.parseBoolean(stripped);
} else if (type instanceof DateType) {
LocalDate.parse(stripped, DateTimeFormatter.ISO_LOCAL_DATE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are writers supposed to update the default value if the type changes? I think BigDecimal might be OK, but it isn't clear to me if DATE->TimestampNTZ is handled correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no support to type widening now. E.g., when a client changes a column type, they are REQUIRED to also change the default values, and we don't support providing a timestamp value to a date field.

We may improve this in the future by allowing more compatible values. For now this table feature is meant to be a minimal working version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, as a follow-up we should make sure that type widening fails appropriately if the value isn't changed.

String newDefaultValue = field.getMetadata().getString(DEFAULT_VALUE_METADATA_KEY);
StructField existingField = oldDefaults.getOrDefault(path, defaultField);
if (!Objects.equals(
existingField.getMetadata().getString(DEFAULT_VALUE_METADATA_KEY), newDefaultValue)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we should only validateLiteral if Iceberg V3 is enabled. This check is narrower then the Delta spec which seems to allow for arbitrary SQL expressions.

Copy link
Contributor Author

@harperjiang harperjiang Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. As mentioned above, this is a minimal working implementation of default value table feature that meant to be mainly used with Iceberg V3. We support only literal now as:

  1. We don't expect a lot of non-V3 usage of this table feature.
  2. Proper validation of SQL expression may requires support from engines, which is missing now.

Due to time limit, we do not plan to implement checks including SQL expression and type widening for now. Thus it is safer to just apply this check to all cases rather than relax it for non V3 case. We don't expect many non-V3 use cases anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I might be confused but it looks like validateColumnDefaultsChanges is called regardless of whether the table actually has Iceberg compat enabled? If this is the case this is a potentially breaking change. At the very least we should have a help error message in the exception raised that distinguishes between:

  1. Iceberg compat is enabled (only literal values are allowed for defaults).
  2. Iceberg compat is not enabled (the default value update was not a literal, but this is just a feature gap in Kernel).

Copy link
Contributor Author

@harperjiang harperjiang Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emkornfield Edited my previous comments a bit.

As we are not ready to parse and validate the general SQL expression in non-V3 case. If we apply the checks only to V3 case, the result will be

  • v3 enabled -> current implementation (literal only)
  • v3 not enabled -> no check on the default value.

In the worst case, a user could use Kernel to generate a Delta table with a default value that no engine recognizes. Are we okay with that?

cc @vkorukanti as I think we have discussed and agreed that we don't want to have Kernel generates unreadable Delta tables.

Regarding your concern of "breaking change", I don't quite get it. The check will only be activated if an user add or update a column with default value. If they don't change the default values on the schema, this check will not be activated, and there will be no impact to them at all. Please let me know if this is the case you have in mind.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was chatting with @harperjiang offline regarding this. To avoid confusion, let's perhaps start with allowing setting this table feature only on IcebergWriterCompatV3 tables folks? That way it's scoped and clear that non-v3 tables have a feature gap that we can address in the future. Does that sound good @emkornfield @vkorukanti ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the worst case, a user could use Kernel to generate a Delta table with a default value that no engine recognizes. Are we okay with that?

I think it is fine to fail. But the reason for failure is different. For Iceberg V3 we fail because Iceberg only supports literals. For non-iceberg V3 we fail because Kernel doesn't support parsing arbitrary expressions. As a user of kernel I think the difference for reason in failure should be made clear if possible.

Regarding your concern of "breaking change", I don't quite get it.

IIUC, before this change a user could put the default value into properties (perhaps without the feature flag), if that value was a valid expression this new code cause the direct put into field properties to fail? I'd guess consumers aren't doing this in general but that was the breaking change I was referring to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion, let's perhaps start with allowing setting this table feature only on IcebergWriterCompatV3 tables folks?

This also works for me as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Will follow @raveeram-db 's suggestion.

@emkornfield
Copy link
Collaborator

There is one question remain open: do we want to limit the default value validation to only when V3 is enabled? My main concern here is that when V3 is not enabled, we are unable to perform a good check, and may allow Kernel to create "invalid" Delta tables that are not readable by other engines. IMO it is safer to perform a rigid check for now, and relax it gradually in the future. Please suggest.

I think this is fine, I responded to the prior comment but I think we should be telling users explicitly why the check failed (i.e. it is V3 and must be a literal, or it is not V3 and we simply haven't implemented support for expressions).

@harperjiang
Copy link
Contributor Author

@emkornfield @raveeram-db thanks for the discussion.
Now we make v3 mandatory for the Default Value to be changed in kernel. Please have a look.

@allisonport-db
Copy link
Collaborator

There is one question remain open: do we want to limit the default value validation to only when V3 is enabled? My main concern here is that when V3 is not enabled, we are unable to perform a good check, and may allow Kernel to create "invalid" Delta tables that are not readable by other engines. IMO it is safer to perform a rigid check for now, and relax it gradually in the future. Please suggest.

I think I am happy for it to be more restrictive now, we can always add support later on. Agree with @emkornfield that the error message is important. -- To that note, I strongly feel that this check for it being literal should definitely happen as part of the checks in IcebergCompatV3MetadataValidatorAndUpdater (it can share a helper method, but it should definitely be there). (Side note, where can I see the icebergCompatV3 RFC? I didn't realize this default value validation was a req of that until now). Later on, if we expand support, we still want that restrictive check to be there.

Along those lines, I commented this somewhere, but can you look into adding this validation to SchemaUtils.validateUpdatedSchemaAndGetUpdatedSchema and SchemaUtils.validateSchema? I don't think it needs to be a separate check and it should go along with where we validate other field metadata.

@allisonport-db
Copy link
Collaborator

allisonport-db commented Aug 13, 2025

I'm still not sure I'm on board with only checking updated default values -- shouldn't we not support writing to a table with any default values that we don't understand?

I guess we don't technically support writing data to a table with any default values at all... but I'm not completely convinced of a good reason to only check the updated ones either.

@harperjiang
Copy link
Contributor Author

harperjiang commented Aug 13, 2025

@allisonport-db

I strongly feel that this check for it being literal should definitely happen as part of the checks in IcebergCompatV3MetadataValidatorAndUpdater (it can share a helper method, but it should definitely be there). (Side note, where can I see the icebergCompatV3 RFC? I didn't realize this default value validation was a req of that until now). Later on, if we expand support, we still want that restrictive check to be there.

I agree that in the final stage, we will have two separated validations:

  • a validation that allows expression for the Default Values Table Feature,
  • a more restrictive (literal only) validation for CompatV3 Table Feature.

Right now, the expression validation is not ready, and we need to have some kind of validation for Default Value table feature. The solution is to use the literal only validation in Default Value for now.

I also agree that once we start working on the "complete" Default Value validation, the literal only check should be moved to a separate V3 only check. But maybe it is not now.

In addition, CompatV3 is an internal feature which do not have public RFC yet.

I'm still not sure I'm on board with only checking updated default values -- shouldn't we not support writing to a table with any default values that we don't understand?

I guess we don't technically support writing data to a table with any default values at all... but I'm not completely convinced of a good reason to only check the updated ones either.

The decision of checking only changed value is following @vkorukanti 's suggestion. I think the main purpose is to allow Kernel to read as many Delta tables as possible. I am fine with either way and will leave the decision to you guys. Please lmk.

@harperjiang
Copy link
Contributor Author

harperjiang commented Aug 21, 2025

@allisonport-db @raveeram-db
synced with @vkorukanti offline. We are okay with applying checks to all the tables for now. Updating the PR.

@allisonport-db
Copy link
Collaborator

SGTM. This also only applies to table writes not reads! So should not have any impact on reads.

@allisonport-db
Copy link
Collaborator

So just for clarity, we will require if we see a default column that:

  • Default column table feature is enabled.
  • IcebergCompatV3 is enabled.
  • The expression is a literal expression compatible with IcebergCompatV3.
  • The expression datatype matches the column datatype.

And:

  • We will check this for the entire schema instead of only checking for changed columns, for all the create/replace/update cases.
  • We will throw an error on data-write side if any default columns are present.

Overall, this SGTM and makes me feel a lot better about safety. We can always loosen restrictions in the future.

Btw, I think if you update SchemaUtils.validateSchema it is run in all the create/replace/update cases. I would also still encourage you to implement the check for literal expressions as part of IcebergCompatV3MetadataValidatorAndUpdater -- we will need it there in the future anyways and that will allow us to throw a more informative exception as @emkornfield called out earlier.

@harperjiang harperjiang force-pushed the columndefault branch 3 times, most recently from 7daf89e to 7cd3452 Compare September 3, 2025 21:03
@harperjiang
Copy link
Contributor Author

@allisonport-db all comments addressed. Please have a look at your convenience. Thanks!

Copy link
Collaborator

@allisonport-db allisonport-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! A few comments on how we enforce the icebergCompatV3 requirement. Since this is a temporary requirement in Kernel for now it shouldn't be baked into the table feature framework, and we should try to throw a more informative error message.

Will review tests on next pass. Ty!

@harperjiang harperjiang force-pushed the columndefault branch 6 times, most recently from 14fa6b5 to 9de3670 Compare September 9, 2025 19:45
throw new KernelException(
"This table does not enable table features for setting column defaults");
}
} else if (!isIcebergCompatV3Enabled) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this wrong and always throws an exception regardless of whether or not there are default values? Also, concerning why this is not caught by a test?!

The flow of this function really should be:

  • Collect default values
  • If default values are present:
    • If !isDefaultColumnsEnabled => throw exception
    • If !isIcebergCompatV3Enabled => throw specific exception
    • Validate default values

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention is to always throw when defaultColumns is enabled and icebergCompatV3 is not, regardless of whether any default values are actually present, then can we make that super clear -- the way this is written is very confusing.

Copy link
Contributor Author

@harperjiang harperjiang Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention is to always throw when defaultColumns is enabled and icebergCompatV3 is not, regardless of whether any default values are actually present

This is the goal. We want to make sure V3 is enabled when Default value table feature is used in Kernel.

the way this is written is very confusing.

I am not sure which part is confusing. I think the error message clearly indicates that the two features need to be enabled together. I can probably tweak the order of "if else". If you can clarify the confusing part I will also be happy to follow.

Is the following look good to you?

if (isEnabled & !isIcebergCompatV3Enabled) {
     throw ...
}
if (!isEnabled) { 
     if (defaultValues.findAny().isPresent()) {
          throw ...
     }
} else {
    ...
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think that is more clear -- and can you add a comment explaining why we do this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future readers won't understand why we have this restriction for now

Stream<StructField> defaultValues = extractFieldsWithDefaultValues(schema);
if (!isEnabled) {
if (defaultValues.findAny().isPresent()) {
throw new KernelException(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an InvalidTableException

Copy link
Contributor Author

@harperjiang harperjiang Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing this back.

  1. We have similar methods DeltaErrors.unsupportedDataType that uses a KernelException
  2. The InvalidTableException needs table path and version, which are not available in SchemaUtils.validateSchema. Passing the info into this method needs some work to refactor several APIs in SchemaUtils. This can be done in a later refactoring, but I wonder if the change worths it.

Comment on lines 133 to 134
throw new UnsupportedOperationException(
"Kernel does not support column defaults for " + type.toString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What types fall here? Are they unsupported by Kernel, or unsupported for Default Columns in general? I.e. I assume you cannot have a default value for Map type. In that case, the error should be something like InvalidTableException since that is not a valid state for a Delta table

Copy link
Contributor Author

@harperjiang harperjiang Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsupported types include Map/List/Variant/Geo etc etc. These limitations are not part of Protocol spec so I will say this is an implementation limitation, rather than an Invalid Table.

In addition, the current API design of SchemaUtils makes it hard to use InvalidTableException there (table info is not passed in). This can be improved later in a separated PR probably, with a refactoring of the API.

field -> {
try {
validateLiteral(field.getDataType(), getRawDefaultValue(field));
} catch (IllegalArgumentException | UnsupportedOperationException e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like that we catch and rethrow excpetions both here and within validateLiteral. I'm also not really sure on what the expected CUJ for the various cases are here can you maybe explain?

Copy link
Contributor Author

@harperjiang harperjiang Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the design to catch exception within validateLiteral is to prepare it to be exposed as a public method later. But I am totally fine to not catch exception in there for now.

By "the expected CUJ for the various cases" do you mean when do we use different exceptions? Basically IllegalArgumentException is thrown for invalid default value on the given column (e.g., non-date literal for a date col), and UnsupportedOperationException is thrown for columns that should not have default value. If that's not what you are asking please clarify.

@harperjiang
Copy link
Contributor Author

@allisonport-db updated. Please see my comments

Copy link
Collaborator

@allisonport-db allisonport-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, a few more requests + took a look at the tests (they look awesome so far thanks!)

if (!isEnabled) {
if (defaultValues.findAny().isPresent()) {
throw new KernelException(
"This table does not enable table features for setting column defaults");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the error message more clear? Found column defaults in the schema but the table does not support column defaults with the columnDefaults table feature.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please put in DeltaErrors + a todo to make this an InvalidTableException

} catch (DateTimeParseException
| IllegalArgumentException
| UnsupportedOperationException e) {
throw DeltaErrors.icebergCompatRequiresLiteralDefaultValue(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment before was related to what exception we throw. For example, if we find a column default for a data type we don't support defaults for, we should throw that exception, not the required literalValue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make validateLiteral have a very clear defined docs + throws behavior?

@throws UnsupportedOperationException if Kernel doesn't support default values for the data type
@throws IllegalArgumentException if the value is not a literal value

--> In validateLiteral we should only throw those exceptions (so need to catch the parse exceptions there)
--> here, we can be specific about our behavior, i.e. take IllegalArgumentException and change to icebergCompatRequiresLiteralDefaultValue; probably just let UnsupportedOperationException fail upward

Comment on lines +698 to +703
SchemaUtils.validateSchema(
schema,
isColumnMappingModeEnabled(mappingMode),
TableFeatures.isPropertiesManuallySupportingTableFeature(
tableProperties, ALLOW_COLUMN_DEFAULTS_W_FEATURE),
TableConfig.ICEBERG_COMPAT_V3_ENABLED.fromMetadata(tableProperties));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @amogh-jahagirdar merged a PR that changes where we do this validation -- you should be able to just use the protocol for this after rebasing

}
}

test("block writing to tables with default values") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test case with default value for nested column?

false))
ColumnDefaults.validateSchema(correctSchema, true, true)
val e = intercept[KernelException] {
ColumnDefaults.validateSchema(correctSchema, true, false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test this case as well with a schema with no default values? Since that was the specific behavior we decided on

@@ -61,7 +64,11 @@ class SchemaUtilsSuite extends AnyFunSuite {
.add("b", INTEGER)
.add("dupColName", StringType.STRING)
expectFailure("dupColName") {
validateSchema(schema, false /* isColumnMappingEnabled */ )
validateSchema(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing this elsewhere can we add a test helper method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then these can each be args with default values so (a) booleans are named (b) don't need to repeat them for every call

schema,
true,
false,
false)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a simple defaultColumns test case here? basically to verify that ColumnDefaults. validateSchema is called as part of the SchemaUtils.validateSchema. Can be super simple.

@@ -1078,6 +1162,7 @@ class SchemaUtilsSuite extends AnyFunSuite {
validateUpdatedSchemaAndGetUpdatedSchema(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally these would also be a test method, but good to not do this in this PR since it's already unideal

"delta.enableIcebergCompatV3" -> "true",
"delta.columnMapping.mode" -> "id")

test("allow default value in schema when the table feature is enabled") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you separate the valid vs invalid cases into separate tests?


import org.scalatest.funsuite.AnyFunSuite

class ColumnDefaultsSuite extends AnyFunSuite with WriteUtils {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also just test the replace case? Then this test coverage looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants