You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spark may write timestamp as the deprecated int96 physical type in Parquet files. Currently, such data cannot be read correctly in Sail.
Arrow reads int96 as timestamp with nanosecond unit, while Spark expects microsecond unit. So the valid value range is different.
Schema analysis request (printSchema()) fails since we cannot convert the Arrow data type (nanosecond unit) back to Spark data type.
We should respect the Spark schema (stored as a metadata key) when reading the Parquet file. Type casting of timestamp seems possible after the recent upstream fix (apache/arrow-rs#7285). So we should be able to handle this after the next Arrow release.
The text was updated successfully, but these errors were encountered:
Spark may write timestamp as the deprecated int96 physical type in Parquet files. Currently, such data cannot be read correctly in Sail.
printSchema()
) fails since we cannot convert the Arrow data type (nanosecond unit) back to Spark data type.We should respect the Spark schema (stored as a metadata key) when reading the Parquet file. Type casting of timestamp seems possible after the recent upstream fix (apache/arrow-rs#7285). So we should be able to handle this after the next Arrow release.
The text was updated successfully, but these errors were encountered: