-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null values for non active union members #466
Comments
What are you using to convert your dynamic values into JSON? The capnproto-rust/capnp/src/stringify.rs Lines 105 to 166 in eaad5e5
|
The json was just a visual example. I am actually converting into arrow arrays and then Polars series for a Polars dataframe. I will dig into the stringify to see if that has the logic I am missing. My problem may be different because I am going from row-wise into columnar. My real input are binary files with an unknown number of messages. I create the arrow schema with all of the same fields as the capnp schema. Then iterate through the fields in the schema to create a vector of capnp readers. Then the capnp readers are converted into arrow arrays. The main problem with nested types is that I need to represent a struct and all the types within it even if it is not active in the union.
If we have three messages (I would actually convert to binary before running them in tests):
I need to create three arrow arrays: a
To help make these arrays my plan is to make the following capnp readers in this psuedocode (values of primitives in comments):
The challenge is getting a struct with a null
Another option would be to have the primitive readers that are non-active fields yield null values. This is the line in my code that extracts the primitive values. Note that the code I am testing on unions has not been pushed. Does this help explain the problem? |
Have you tried making the first member of your union a dummy See https://capnproto.org/language.html#unions By default, when a struct is initialized, the lowest-numbered field in the union is “set”. If you do not want any field set by default, simply declare a field called “unset” and make it the lowest-numbered field. Said differently, in capnproto unions are not messages, the union is not a pointer that can be left null, the union members are inline in that place in the message, and leaving that as all-zeroes just means |
I am converting a series of capnp messages into a columnar format (arrow specifically). One of the challenges with unions is non-active union fields. I recursively create a vector of dynamic value readers for each field and then convert that into arrays of arrow memory. When the field is a member of a union and active this works fine. When the field is not active then this creates fake data instead of null.
For example the schema:
With data:
[{"foo": 1}, {"bar": 1}]
generates the output:[{"foo": 1, "bar": 0}, {"foo": 0, "bar": 1}]
. What I would expect is[{"foo": 1, "bar": null}, {"foo": null, "bar": 1}]
. I have tried creatingdynamic_value::Reader::Void
when the field is non-active, but this is challenge with nested struct and list types.For structs I have tried creating a new empty
dynamic_struct::StructReader
using the private layout:This still leads to primitive ints with
0
value.Is it possible to create readers with null values?
Would it make sense to have non-active union fields have null values (I assume the expectation is users check
has
to find active values and ignore non-active values)?The text was updated successfully, but these errors were encountered: