-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difficulties in handling NaN and NULL #11029
Comments
Hi! Thanks for the issue. I DON'T want to make the same mistake that pandas did, and treat NaN and +/-infinity the same as NULL. I think the current default behavior is good. I also do NOT want fill_na and fill_null to act differently. They are too similarly named, people would footgun themselves.
The alternative to 2, and what I recommend in the meantime, is making a custom function for yourself: def is_nullish(n: ir.Value) -> ir.BooleanValue:
if n.type().is_floating():
return n.isnull() | n.isnan()
else:
return n.isnull()
# and then use as
float_col=is_nullish(_.float_col).ifelse(-1, _float_col), |
Hello, and thank you very much for your feedback. I understand and agree that Pandas is making a mistake by not differentiating between NaN and NULL, and that it's a good idea for Ibis to make this distinction. If Ibis makes this technical choice, users should be aware of this distinction (by improving the documentation), in which case keeping the two functions fill_na and fill_null shouldn't be a problem. In other words, if the coexistence of these two functions is considered ambiguous, it's necessarily because the technical choice to differentiate between NaN and NULL was not sufficiently explained. In my opinion, it's more the current behavior that can be a source of error, mistakenly thinking that all missing values have been filled after using fill_null, when NaN values remain. This is precisely what happened to me, and debugging it wasn't easy. It's possible that other users have made the same mistake in scripts running in production, without necessarily noticing it yet. Otherwise, I think adding a nan_as_null (bool) argument to the fill_null function (not just isnull) could be a good compromise! This would be a simple and fairly natural way to make users aware that Ibis distinguishes between null and NaN, without having to reintroduce the deprecated fill_na function or change the default behavior. |
Hello,
I'm having some trouble handling NaN and NULL values, especially for float columns. I don't know if the documentation could be improved on this point, if the API could be improved, or if I'm going about it the wrong way.
Here's a simple example to illustrate my problem. I'm creating a simple table with a string column, an int column, and a float column, with some missing values.
The Pandas display doesn't show this, but note that in the case of a float column, I can have a NULL value and a NaN value, which are technically not the same thing. This is visible in my database.


When I try to use fill_null on the float column, it only applies to the NULL value and not the NaN value.
So I try using fillna instead, but it doesn't change anything because it's just an alias for fill_null (and, moreover, fillna will soon be deprecated).
Ultimately, the only way I have to solve the problem is to use an ifelse, which considerably complicates the syntax.
Wouldn't it be interesting to keep fillna while adapting the implementation to this type of case? Or, better yet, to handle NaNs and NULLs interchangeably when applying fill_null to a float column?
Thank you in advance for your help!
Regards,
What version of ibis are you using?
What backend(s) are you using, if any?
The text was updated successfully, but these errors were encountered: