-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce support for parsing common JSON operators #10981
Conversation
One way might be to use functions (rather than operators) to represent this So for example, the planner might convert Edit; though I can see it is not obvious how to do this in the existing planner 🤔 |
@alamb, yup we'd be very happy to do this via query rewriting (either in The problem is AFAIK right now an error is raised when the operator (e.g. Hence we just need the operator to survive in the AST long enough for us to replace it. |
I imagined that the implementation path would be something similar to how If there's a way to do the rewriting that doesn't require changes to the SQL parsing to support the additional operators, happy to stick to that for now, we just didn't see a way to do that. (We hit the error produced by this line
(Sorry for confusion if I'm misunderstanding your comment!) |
I agree - If there is no other way to support the syntax feature we should add it to the core. Another idea would be to add an API to the planner that allows planning custom operators that is called before the built in rules. That API might take a little longer to get in, but it would be much more flexible |
I hope #11132 is the first step towards implementing this. |
See #11137 which should fully replace this. |
Closing in favor of #11137 |
Opening this PR at this (early) stage primarily for the sake of getting initial feedback on its viability; @samuelcolvin has additional context.
The idea is that we want to experiment with rewriting operators to use the JSON functions in https://github.com/datafusion-contrib/datafusion-functions-json, but it would help if
datafusion
supported parsing the relevant operators. We can forkdatafusion
to add this support if necessary, but even if that's the case I was hoping to get some guidance about the implementation, and in particular feedback on whether the path this is going down is "wrong" or has a chance of getting merged eventually as the JSON functions stuff gets more mature.Which issue does this PR close?
Related to #7845, though it does not close that issue.
Rationale for this change
We are using https://github.com/datafusion-contrib/datafusion-functions-json, but the problem is that, as far as I can tell, there is not a way to add support for new operators without modifying the SQL parsing in the main
datafusion
crate. We want to add support for the common JSON operators (at least?
and->
/->>
, if not the others used by Postgres), and are happy to try those out ourselves, but it would be nice if parsing support was included in thedatafusion
crate even if the new operators remain completely unused.The good news is that the various operators can already be parsed by the
sqlparser
crate so adding support is mostly mechanical, other than determining their signatures (which admittedly may pose a problem/debate).What changes are included in this PR?
This PR currently just adds support for parsing the
?
(Question
),->
(Arrow
), and->>
(LongArrow
) operators fromsqlparser::ast::BinaryOperator
.I'll note that Postgres also has
#>
(HashArrow
),#>>
(HashLongArrow
),#-
(HashMinus
),@?
(AtQuestion
),?&
(QuestionAnd
),?|
(QuestionPipe
),@@
(AtAt
), as established JSON operators that it might be nice to add support for at the same time, considering these are already cases in thesqlparser::ast::BinaryOperator
enum. I would be happy to also add support for some/all of those as well if desirable.Are these changes tested?
Not yet, I'll be happy to add some tests if this has a chance of being accepted.
Are there any user-facing changes?
This PR adds the ability to parse these operators. For most/all of them, we'll presumably want any actual use to produce errors if the operator isn't involved in a user-defined rewrite. Right now, given our interest in using them for JSON, I think we'd like to support
Utf8
/LargeUtf8
as inputs where aJSON
value would be expected, but I'm not sure if that makes sense if we want to introduce support for other types.