Skip to content

Conversation

benrobby
Copy link

@benrobby benrobby commented Aug 25, 2025

What changes were proposed in this pull request?

  • this is a followup to [SPARK-52821][PYTHON] add int->DecimalType pyspark udf return type coercion #51538, it now also adds support for integer to decimal type coercion to udfs with useArrow=True when spark.sql.legacy.execution.pythonUDF.pandas.conversion.enabled=False.
  • For this, we now forwards the existing spark conf spark.sql.execution.pythonUDF.pandas.intToDecimalCoercionEnabled from the worker to the ArrowBatchUDFSerializer and then to the Python->Arrow converter.

Why are the changes needed?

Python UDFs with useArrow=True and spark.sql.legacy.execution.pythonUDF.pandas.conversion.enabled=False do not support type coercion from int to DecimalType if the target precision of the DecimalType is too low:

@udf(returnType=DecimalType(2, 1), useArrow=True)
def test:
  return 1
spark.range(1,2,1,1).select(test(col('id'))).display() 
# expected: (Decimal) 1.0
# actual:   
File "/deps/pyspark/sql/conversion.py", line 314, in convert_decimal
    assert isinstance(value, decimal.Decimal)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Does this PR introduce any user-facing change?

No, spark.sql.execution.pythonUDF.pandas.intToDecimalCoercionEnabled is still off by default, so this is not a behavior change.

How was this patch tested?

  • added unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@benrobby
Copy link
Author

benrobby commented Aug 25, 2025

@HyukjinKwon @asl3 @zhengruifeng pls take a look

@benrobby benrobby changed the title [SPARK-53367][PYTHON] add int to decimal coercion for Arrow UDFs [SPARK-53367][PYTHON][SQL] add int to decimal coercion for Arrow UDFs Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants