You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-53275][SQL] Handle stateful expressions when ordering in interpreted mode
### What changes were proposed in this pull request?
This PR updates `InterpretedOrdering` to use a different copy of stateful expressions when evaluating the two input rows.
### Why are the changes needed?
Consider these spark-shell commands:
```
# for this particular example, the bug is exercised when there are 2 executors
bin/spark-shell --master "local[2]"
import org.apache.spark.sql.functions.udf
spark.udf.register("udf", (s: String) => s)
Seq((0, "2"), (0, "1")).toDF("a", "b").createOrReplaceTempView("v1")
// return a correct result: Array([0,1], [0,2])
sql("select a, udf(b) from v1 order by a, udf(b) asc").collect
// run in interpreted mode
sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
// return an incorrect result: Array([0,2], [0,1])
sql("select a, udf(b) from v1 order by a, udf(b) asc").collect
```
This is because the `ScalaUDF` expression indirectly holds an UnsafeRow as a buffer (via a serializer, which holds an `UnsafeProjection`, which holds the `UnsafeRow` buffer). When the udf is evaluated for the first row, the resulting `UTF8String` uses the `UnsafeRow`'s base object as its own base object. When the udf is evaluated for the second row, that same base object is updated such that both `UTF8String` objects contain the same string value.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit test.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#52028 from bersprockets/ordering_issue.
Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Peter Toth <peter.toth@gmail.com>
0 commit comments