Unexpected behavior when execute CREATE OR REPLACE AS SELECT with REST Catalog #12738

baotran306 · 2025-04-07T08:33:44Z

Apache Iceberg version

1.7.2

Query engine

Spark

Please describe the bug 🐞

Describe bug

When I try this commandCREATE OR REPLACE TABLE AS SELECT, it works well for several cases including:

Replace table with structure only (I try to limit 0 in my select statement)
Replace table from non-partition table to partition table
Replace table from identity partition column to another identity partition column

However, there's one case I've tested that was unsuccessful: attempting to replace a table with a transform partitiontion to non-partitioned.

Step reproduce

Firstly, I created an empty non-partitioned table using the CREATE TABLE command, followed by inserting test data. Next, I created another partitioned table with a different structure using the same command as in the first step.

Then, I tried to replace the existing partitioned table using the CREATE OR REPLACE TABLE AS SELECT command along with my non-partitioned table. My expectation was that the table would be completely replaced, including both the schema and data, with the non-partitioned table successfully. However, it didn't work and raised org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 24.0 failed 1 times, most recent failure: Lost task 0.0 in stage 24.0 (TID 50) (10.16.14.4 executor driver): org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: id_bucket: bucket[16](1)

Okay, I tried one more time. Before replacing the table, I dropped all partition columns using ALTER TABLE DROP PARTITION FIELD. However, it still resulted in the same errors. I don't understand why it is still checking for partition fields when I had already dropped them before replacing the table.(I've already checked metadata file, and there's no partition spec now as I dropped before)

My code: Spark + Rest Catalog(catalog_name = 'lakekeeper')

# Create non-partition table
spark.sql("""
    CREATE TABLE IF NOT EXISTS lakekeeper.risk_mon_out.my_data_non_partitioned(
        vendor_id int,
        product string,
        price double,
        event_time timestamp      
    )
""")
spark.sql("""
    INSERT INTO lakekeeper.risk_mon_out.my_data_non_partitioned VALUES
          (1, "Car", 2.34, TIMESTAMP '2025-03-01 11:03:01'),
          (2, "Bike", 5.43, TIMESTAMP '2025-03-01 11:19:21'),
          (3, "House", 100.4, TIMESTAMP '2025-03-01 12:43:17'),
          (4, "Plane", 124.4, TIMESTAMP '2025-03-01 13:05:29')
""")
spark.sql("SELECT * FROM lakekeeper.risk_mon_out.my_data_non_partitioned").show()

# Create partition table with transformation
spark.sql("""
    CREATE TABLE lakekeeper.risk_mon_out.data_sample_test(
        id int,
        category string,
        event_time timestamp      
    )
    PARTITIONED BY (
        bucket(16, id),
        category
    )
""")
spark.sql("""
    INSERT INTO lakekeeper.risk_mon_out.data_sample_test VALUES
          (5, "HOVal", TIMESTAMP '2025-03-01 11:03:09'),
          (6, "NextMi", TIMESTAMP '2025-03-01 12:19:29'),
          (7, "OGel", TIMESTAMP '2025-01-09 06:43:17'),
          (8, "AAAA", TIMESTAMP '2025-01-09 06:05:29')
""")
spark.sql("SELECT * FROM lakekeeper.risk_mon_out.data_sample_test").show()

# Replace partition table to non-partition table
try:
    spark.sql("""
        CREATE OR REPLACE TABLE lakekeeper.risk_mon_out.data_sample_test
        USING iceberg
        AS SELECT * FROM lakekeeper.risk_mon_out.my_data_non_partitioned
    """)
except Exception as e:
    print(f"Error 111: {e}")

try:
    spark.sql("ALTER TABLE lakekeeper.risk_mon_out.data_sample_test DROP PARTITION FIELD id_bucket")
    spark.sql("ALTER TABLE lakekeeper.risk_mon_out.data_sample_test DROP PARTITION FIELD category")
    print("Drop done")
    spark.sql("""
        CREATE OR REPLACE TABLE lakekeeper.risk_mon_out.data_sample_test
        USING iceberg
        AS SELECT * FROM lakekeeper.risk_mon_out.my_data_non_partitioned
    """)
except Exception as e:
    print("=====")
    print(f"Error 222: {e}")

Expected Behavior

Table will be replaced entirely with both schema and data.

Environment

Apache Iceberg Version: 1.7.2
Execution Engines Tested: Spark 3.5.2
Catalog Type: REST catalog
Storage: S3-compatible

Willingness to contribute

I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time

The text was updated successfully, but these errors were encountered:

baotran306 added the bug Something isn't working label Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected behavior when execute CREATE OR REPLACE AS SELECT with REST Catalog #12738

Unexpected behavior when execute CREATE OR REPLACE AS SELECT with REST Catalog #12738

baotran306 commented Apr 7, 2025

Unexpected behavior when execute CREATE OR REPLACE AS SELECT with REST Catalog #12738

Unexpected behavior when execute CREATE OR REPLACE AS SELECT with REST Catalog #12738

Comments

baotran306 commented Apr 7, 2025

Apache Iceberg version

Query engine

Please describe the bug 🐞

Describe bug

Step reproduce

My code: Spark + Rest Catalog(catalog_name = 'lakekeeper')

Expected Behavior

Environment

Willingness to contribute