Skip to content

Conversation

gurpal75
Copy link

@gurpal75 gurpal75 commented Aug 24, 2025

What changes were proposed in this pull request?

This PR fixes SPARK-53347:
from_protobuf() incorrectly deserializes google.protobuf.BoolValue fields set to false as null.

In Protobuf 3, primitive types (e.g. bool) have no field presence, but wrapper types such as google.protobuf.BoolValue are messages that can distinguish between:

  • field absent → null
  • field present with falsefalse
  • field present with truetrue

The existing Spark Protobuf deserializer dropped the distinction for BoolValue(false) and returned null.
This patch updates ProtobufDeserializer so that BoolValue(false) is correctly deserialized as false, while keeping the existing semantics for all other types.

Why are the changes needed?

Without this patch, Spark users cannot distinguish between false and null when reading Protobuf data using from_protobuf().
This breaks correctness in queries where boolean fields are optional but explicitly set to false.

Correct handling is critical for applications that rely on distinguishing unset from false values (e.g. feature flags, filters, optional booleans in business logic).

Does this PR introduce any user-facing change?

Yes.
Previously:

// Scala
message BoolWrapper {
  google.protobuf.BoolValue flag = 1;
}

When parsing a message with that contained a false boolean value by doing :
df.select(from_protobuf($"bytes", "BoolWrapper", desc)).show()
Produced :

+----+
|flag|
+----+
|null|
+----+

After the patch we get :

+-----+
| flag|
+-----+
|false|
+-----+

The behaviour when value is true remains unchanged.

How was this patch tested?

Added a new unit test in ProtobufFunctionsSuite:
Ensures BoolValue.of(true) → true
Ensures BoolValue.of(false) → false
Ensures absent field → null
Ran the existing ProtobufFunctionsSuite and related Protobuf tests to confirm no regressions.

Was this patch authored or co-authored using generative AI tooling?

Yes, with the help of Genie.
Generated-by Genie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant