-
Notifications
You must be signed in to change notification settings - Fork 2.8k
NIFI-14457 New processor to run xQuery on BaseX #9933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
<url>https://files.basex.org/maven</url> | ||
</repository> | ||
</repositories> | ||
<properties> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these property entries should not be needed and we want to avoid having such things in nars themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
<artifactId>nifi-basex-nar</artifactId> | ||
<packaging>nar</packaging> | ||
<repositories> | ||
<repository> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The org.basex artifacts 'appear' to be in maven central. Is it strictly necessary to add a specific one here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I changed to the newest one vestion from maven and deleted it.
<artifactId>nifi-basex-processors</artifactId> | ||
|
||
<properties> | ||
<maven.compiler.source>21</maven.compiler.source> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These properties should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
<repository> | ||
<id>basex</id> | ||
<name>BaseX Maven Repository</name> | ||
<url>https://files.basex.org/maven</url> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments a repo entry in other pom
<dependency> | ||
<groupId>org.basex</groupId> | ||
<artifactId>basex</artifactId> | ||
<version>7.6</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this version appears to be extremely old. This is a non-starter for inclusion. If we're adding a new component it needs to be on the latest stable release of that component in general or some really strong case to not do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like in the comment above.
+ "'success' (on successful execution), 'failure' (if the script fails), and 'original' (preserves the unmodified FlowFile)." | ||
) | ||
@SystemResourceConsiderations({ | ||
@SystemResourceConsideration(resource = SystemResource.MEMORY, description = "Processing requires reading the entire FlowFile into memory") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is good this is here - but I'd recommend making it more dramatic in description.
As currently written the entire content of the flowfile is read into memory and I think as written that might be twice the size at least. On serialization it is also at least once but might be twice as well. Either way this is for large XML documents a serious consideration for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will try to implement it.
<parent> | ||
<groupId>org.apache.nifi</groupId> | ||
<artifactId>nifi-basex-bundle</artifactId> | ||
<version>2.4.0-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these will need to be bumped to 2.5.0-SNAPSHOT
ALSO the nifi-extension-bundles pom needs to know about the nifi-basex-bundle pom or it will not be built
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
@@ -0,0 +1,277 @@ | |||
/* | |||
* Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[22,43] (imports) AvoidStarImport: Using the '.' form of import should be avoided - org.apache.nifi.annotation.behavior..
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[25,34] (imports) AvoidStarImport: Using the '.' form of import should be avoided - org.apache.nifi.components..
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[168,13] (whitespace) WhitespaceAround: '}' is not followed by whitespace.
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[168,14] (whitespace) WhitespaceAround: 'else' is not preceded with whitespace.
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[218,5] (blocks) LeftCurly: '{' at column 5 should be on the previous line.
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[233,5] (blocks) LeftCurly: '{' at column 5 should be on the previous line.
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[244,88] (whitespace) WhitespaceAround: '{' is not preceded with whitespace.
<dependency> | ||
<groupId>org.apache.nifi</groupId> | ||
<artifactId>nifi-standard-shared-nar</artifactId> | ||
<version>2.3.0-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.5.0-SNAPSHOT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I changed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurring with feedback from @joewitt, there are a large number of issues to be addressed before this can be considered for inclusion.
The most fundamental issue is the legacy version of the library referenced. The latest version in Maven Central is 11.9 at the time of this writing.
The hard-coded batching approach also raises some implementation questions.
@Override | ||
public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException { | ||
Context basexContext = new Context(); | ||
final List<FlowFile> flowFileBatch = session.get(50); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for having this hard-coded batch of 50, as opposed to just handling one FlowFile and supporting batching through annotations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this strategy from EvaluateXQuery processor, but if it's not needed I can change it to one FlowFile.
Extensive feedback has been provided and considerable work is needed to reconsider for inclusion. This is very likely a good candidate to exist as its own github project for now to evaluate for inclusion later with adoption/usage and signs of maintenance. I am closing the PR for now but it can be revisited once feedback has been incorporated. |
After corrections I will prepare new request. |
Summary
Nifi 14457- New processor to run xQuery on BaseX.
Introduced a new
EvaluateBaseXQuery
for executing XQuery queries against a remote or embedded BaseX XML database instance. This processor allows integration with BaseX for high-performance XML data querying and transformation.Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-XXXXX
NIFI-XXXXX
Pull Request Formatting
main
branchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-check
Licensing
LICENSE
andNOTICE
files (nifi-basex-processor-nar
)nifi-assembly
LICENSE
andNOTICE
filesDocumentation