Skip to content

NIFI-14457 New processor to run xQuery on BaseX #9933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

TomaszK-stack
Copy link
Contributor

Summary

Nifi 14457- New processor to run xQuery on BaseX.

Introduced a new EvaluateBaseXQuery for executing XQuery queries against a remote or embedded BaseX XML database instance. This processor allows integration with BaseX for high-performance XML data querying and transformation.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-XXXXX
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-XXXXX

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • BaseX dependency (BSD 3-Clause) verified to be compatible with Apache License 2.0
  • BaseX dependency added to module-level LICENSE and NOTICE files (nifi-basex-processor-nar)
  • BaseX added to root nifi-assembly LICENSE and NOTICE files

Documentation

  • Processor documentation renders correctly
  • Usage examples and configuration parameters documented

<url>https://files.basex.org/maven</url>
</repository>
</repositories>
<properties>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these property entries should not be needed and we want to avoid having such things in nars themselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

<artifactId>nifi-basex-nar</artifactId>
<packaging>nar</packaging>
<repositories>
<repository>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The org.basex artifacts 'appear' to be in maven central. Is it strictly necessary to add a specific one here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I changed to the newest one vestion from maven and deleted it.

<artifactId>nifi-basex-processors</artifactId>

<properties>
<maven.compiler.source>21</maven.compiler.source>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These properties should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

<repository>
<id>basex</id>
<name>BaseX Maven Repository</name>
<url>https://files.basex.org/maven</url>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments a repo entry in other pom

<dependency>
<groupId>org.basex</groupId>
<artifactId>basex</artifactId>
<version>7.6</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this version appears to be extremely old. This is a non-starter for inclusion. If we're adding a new component it needs to be on the latest stable release of that component in general or some really strong case to not do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in the comment above.

+ "'success' (on successful execution), 'failure' (if the script fails), and 'original' (preserves the unmodified FlowFile)."
)
@SystemResourceConsiderations({
@SystemResourceConsideration(resource = SystemResource.MEMORY, description = "Processing requires reading the entire FlowFile into memory")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is good this is here - but I'd recommend making it more dramatic in description.

As currently written the entire content of the flowfile is read into memory and I think as written that might be twice the size at least. On serialization it is also at least once but might be twice as well. Either way this is for large XML documents a serious consideration for users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will try to implement it.

<parent>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-basex-bundle</artifactId>
<version>2.4.0-SNAPSHOT</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these will need to be bumped to 2.5.0-SNAPSHOT

ALSO the nifi-extension-bundles pom needs to know about the nifi-basex-bundle pom or it will not be built

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -0,0 +1,277 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[22,43] (imports) AvoidStarImport: Using the '.' form of import should be avoided - org.apache.nifi.annotation.behavior..
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[25,34] (imports) AvoidStarImport: Using the '.' form of import should be avoided - org.apache.nifi.components..
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[168,13] (whitespace) WhitespaceAround: '}' is not followed by whitespace.
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[168,14] (whitespace) WhitespaceAround: 'else' is not preceded with whitespace.
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[218,5] (blocks) LeftCurly: '{' at column 5 should be on the previous line.
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[233,5] (blocks) LeftCurly: '{' at column 5 should be on the previous line.
[WARNING] src/main/java/org/apache/nifi/processors/basex/EvaluateBaseXQuery.java:[244,88] (whitespace) WhitespaceAround: '{' is not preceded with whitespace.

<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-standard-shared-nar</artifactId>
<version>2.3.0-SNAPSHOT</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.5.0-SNAPSHOT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I changed it.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurring with feedback from @joewitt, there are a large number of issues to be addressed before this can be considered for inclusion.

The most fundamental issue is the legacy version of the library referenced. The latest version in Maven Central is 11.9 at the time of this writing.

The hard-coded batching approach also raises some implementation questions.

@Override
public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {
Context basexContext = new Context();
final List<FlowFile> flowFileBatch = session.get(50);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for having this hard-coded batch of 50, as opposed to just handling one FlowFile and supporting batching through annotations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this strategy from EvaluateXQuery processor, but if it's not needed I can change it to one FlowFile.

@joewitt
Copy link
Contributor

joewitt commented May 12, 2025

Extensive feedback has been provided and considerable work is needed to reconsider for inclusion.

This is very likely a good candidate to exist as its own github project for now to evaluate for inclusion later with adoption/usage and signs of maintenance.

I am closing the PR for now but it can be revisited once feedback has been incorporated.

@joewitt joewitt closed this May 12, 2025
@TomaszK-stack
Copy link
Contributor Author

After corrections I will prepare new request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants