-
Notifications
You must be signed in to change notification settings - Fork 31
Add rfc for native built in functions #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add rfc for native built in functions #41
Conversation
@kevintang2022 |
Adding this should have no effect on the NativeFunctionNamespaceManager. NativeFunctionNamespaceManager is still the serving namespace manager for The reason it is needed is because we want users to be able to call native functions without specifying the full qualified name. It seems in the current implementation of sidecar, users need to fully type |
Thanks, I see. |
Actually, it's not required to use native.default with sidecar enabled. You can have presto.default and have sidecar enabled, and you can call the native functions by using fully qualified name. The reason this is needed is because constant folding is not quite ready to be used with the sidecar functions. This means that we still need to use coordinator implementation in some cases, but use worker implementation in some other cases. You're right that if we just want to always use the worker implementation, then NativeFunctionNamespaceManager will work as is. However, we need to have some selective logic for the use case of sometimes using Java implementation and sometimes using C++ implementation. Also, we don't want users to have to specify the full qualified function name |
@kevintang2022 Got it, the use case is clear to me now. Thanks for clarifying.
Have you tested the scenario where this exact setup works? Prestissimo doesn't support multiple namespaces. The functions that we report from Prestissimo need to be under a namespace, in the case where sidecar is enabled it is Additionally another question is, when you run queries let's say |
* Function with only C++ implementation: no constant fold | ||
* Function with java and c++ implementation: constant fold, existing behavior | ||
* Function with only SQL implementation: use sql implementation, existing behavior | ||
* Function with SQL and C++ implementation: use c++ implementation, new behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function with SQL and C++ implementation: use c++ implementation, new behavior.
Do we have an example of this implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map_remove_null_values has a SQL implementation in coordinator and C++ implementation from worker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map_remove_null_values has a SQL implementation in coordinator and C++ implementation from worker
Yes, I tested this with presto.default namespace as default config for both coordinator and worker. In this scenario, it's not possible to run native.defauolt functions, but everything else will be the same as if the sidecar was not enabled. This is because for most cases, we just want existing behavior. It's only for very few cases where we would want to change existing behavior (like C++ implementation overriding a SQL implementation)
No, this would not throw error because we never use native.default. It will be presto.default for both coordinator and worker. |
I see this is for a specific use case, where in you don't want to use the |
Instead of adding a BuiltInPluginFunctionNamespaceManager, add a BuiltInNativeFunctionNamespaceManager. Register the functions from native sidecar under the presto.default namespace. During function evaluation inside of FunctionAndTypeManager#resolveFunctionInternal and getMatchingFunctionHandle, add some logic to determine whether the function in BuiltInTypeAndFunctionNamespaceManager or the function in BuiltInNativeFunctionNamespaceManager is used. This is similar to the PR, except the logic in PR will throw an error if it sees that there is a conflict. If there is a conflict, follow this logic to determine which functions to use: | ||
|
||
|
||
* If the one already inside the BuiltInTypeAndFunctionNamespaceManager is a Java implementation, then use the one in BuiltInTypeAndFunctionNamespaceManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you don't have access to the FunctionMetadata
at this point to determine its implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within functionAndTypeManager, we should be able to access the functionMetadata to determine the implementation type.
https://github.com/prestodb/presto/pull/25597/files
Add something like this in the getMatchingFunctions
FunctionHandle functionHandle1 = functionNamespaceManager.getFunctionHandle(transactionHandle, matchingDefaultFunctionSignature.get());
FunctionHandle functionHandle2 = builtInPluginFunctionNamespaceManager.getFunctionHandle(matchingPluginFunctionSignature.get());
// can read the function implementation type like this
functionNamespaceManager.getFunctionMetadata(functionHandle1).getImplementationType()
Yes that describes our use case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RFC doesn't have an adoption plan, which is a part of the template. Can you add details on the eventual deprecation and removal of it (once constant folding fully delegates to C++ evaluation)?
efed2f4
to
0245089
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this writeup.
0245089
to
9af6017
Compare
Summary: This PR includes changes for Built in functions for different use cases - Register SQL invoked function in built in namespace when they come from plugins (prestodb#25597) - Register Native functions in built in namespace when they come from sidecar (prestodb/rfcs#41) Since these use cases have a lot of overlap, we can introduce an abstract class in order to reduce duplicate logic. ## Description ## Motivation and Context ## Impact Pull Request resolved: prestodb#25661 Test Plan: <!---Please fill in how you tested your change--> ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == RELEASE NOTES == General Changes * ... * ... Hive Connector Changes * ... * ... ``` If release note is NOT required, use: ``` == NO RELEASE NOTE == ``` Rollback Plan: Differential Revision: D79301760 Pulled By: kevintang2022
Summary: This PR includes changes for Built in functions for different use cases - Register SQL invoked function in built in namespace when they come from plugins (prestodb#25597) - Register Native functions in built in namespace when they come from sidecar (prestodb/rfcs#41) Since these use cases have a lot of overlap, we can introduce an abstract class in order to reduce duplicate logic. ## Description ## Motivation and Context ## Impact Pull Request resolved: prestodb#25661 Test Plan: <!---Please fill in how you tested your change--> ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == RELEASE NOTES == General Changes * ... * ... Hive Connector Changes * ... * ... ``` If release note is NOT required, use: ``` == NO RELEASE NOTE == ``` Rollback Plan: Differential Revision: D79301760 Pulled By: kevintang2022
Summary: This PR includes changes for Built in functions for different use cases - Register SQL invoked function in built in namespace when they come from plugins (prestodb#25597) - Register Native functions in built in namespace when they come from sidecar (prestodb/rfcs#41) Since these use cases have a lot of overlap, we can introduce an abstract class in order to reduce duplicate logic. ## Description ## Motivation and Context ## Impact Pull Request resolved: prestodb#25661 Test Plan: <!---Please fill in how you tested your change--> ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == RELEASE NOTES == General Changes * ... * ... Hive Connector Changes * ... * ... ``` If release note is NOT required, use: ``` == NO RELEASE NOTE == ``` Rollback Plan: Differential Revision: D79301760 Pulled By: kevintang2022
* Function with only java implementation: constant fold, existing behavior | ||
* Function with only C++ implementation: no constant fold | ||
* Function with java and c++ implementation: constant fold, existing behavior | ||
* Function with only SQL implementation: use sql implementation, existing behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would presume SQL inlining happens before constant folding. They are orthogonal concerns as far as I understand. SQL inlining issues are handled by https://github.com/prestodb/rfcs/pull/40/files only.
Can you confirm ?
But there is a correlation in that :
- If there is a C++ function for a SQL function, then we want to pick up the C++ function. So if the function was originally constant folded it will not anymore.
- If the SQL function expands to only java functions then it will get constant folded.
- For new SQL functions that expand to new C++ functions, these are not constant folded.
That's about the impact to the current RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RFC is about picking which FunctionHandle to use. If there is a SQL implementation and a C++ implementation, then the C++ function handle will be used. If there is a Java implementation, then the java function handle will take priority. By doing this, it creates the side effect of which functions are constant folded or not, and this is determined by the implementation type.
This code ponter shows that Java implemenetation are constant folded, but C++ implementations are not.
https://github.com/prestodb/presto/blob/8041eb5e3ae0dc6e6a656d3e68d90ed8e1fda099/presto-spi/src/main/java/com/facebook/presto/spi/function/FunctionImplementationType.java#L16
|
||
* If we choose to have separate namespaces for built in functions from coordinator and functions from sidecar, then this would require extra changes on the worker code. Currently the worker code is using one prefix for every function, and this prefix is configurable. However, you cannot set a different prefix for different functions coming from the sidecar; they would all be the same prefix. | ||
|
||
* Since adding more worker changes adds extra complexity and confusion, we decided to use presto.default. prefix for every function from sidecar that is registered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then how are you handling conflicts then ? I do see it depends on the code you have suggested below. But can you describe what the desired outcome is in a functional manner here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functional outcome is that
- it keeps constant folding as an existing behavior for functions that are currently constant folded
- uses C++ implementation if a function is not currently being constant folded
- allows users to take advantage of this optimization without having to specifiy fully qualified function name
9af6017
to
0af3a74
Compare
No description provided.