Skip to content

Conversation

sawansri
Copy link
Contributor

@sawansri sawansri commented Aug 12, 2025

Description

Implements window based scoring strategy for multi-fold boolean query speedup

The speedup comes from optimizations at two levels:

  1. Early Termination of ConjunctionDISI - once 10k hits have been collected from the ConjunctionDISI, BulkScorer.score() returns DISI.NO_MORE_DOCS (essentially max integer) to signal to the CancellableBulkScorer to stop calling the score() method since scoring/collecting is completed. Previously, DISI.NO_MORE_DOCS would only be returned when the entire ConjunctionDISI has been exhausted but this optimization early terminates at 10k since any conjunction hits past that are not displayed/sent back to the user (constant score case).

  2. Window Scoring Approach - As outlined in the issue, only build clause iterators of window size, run the conjunction once, see if 10k hits have been reached, if not then expand window, collect a larger iterator, and run the conjunction again. This can be further optimized by caching a copy of the previous iterator and utilizing visit(DocIdSetIterator iterator) to bulk add these already visited docIDs to the new iterator with larger size. Then continue the BKD Traversal from where the last iterator left off (using BKDState) and build the rest of the iterator. This approach would eliminate the redundant work done by each window (of scoring/collecting docs that have already been traversed by the previous iterator) improving performance further. Additional memory usage is something that would have to be benchmarked.

Related Issues

Resolves #19045

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 530c12c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 059c177: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search:Performance labels Aug 13, 2025
Copy link
Contributor

❌ Gradle check result for ecf2a08: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b368016: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 1b603be: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Implemented canApproximate and rewrite for single clause bool queries in ApproximateBooleanQuery

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Added ApproximateScoreQuery wrapping in BoolQueryBuilder

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Enabled approximation in single clause bool queries by calling setContext in ApproximateBooleanQuery

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Removed redundancy by adding pattern matching to ApproximateScoreQuery check

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

setContext in canApproximate to remove redundant context variable

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

fix rewrite method to default to original query for multi-clause case

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Prevent multi-clause bool queries from using ApproximateBooleanQuery (for now)

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Fix failing tests for single clause boolean queries

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

fix nested single clause boolean queries

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Enabled proper recursive rewriting to ensure clauses are properly rewritten

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Unwrap boolean query in setContext

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Removed redundant unwrap methods

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Fixed more integration tests

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

Actually check whether nested query can be approximated

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
… clauses

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Copy link
Contributor

✅ Gradle check result for d9b510a: SUCCESS

Copy link

codecov bot commented Aug 22, 2025

Codecov Report

❌ Patch coverage is 65.64885% with 90 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.87%. Comparing base (f81d75a) to head (753f3a8).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
.../approximate/ApproximateBooleanScorerSupplier.java 64.86% 22 Missing and 17 partials ⚠️
...ch/search/approximate/ApproximateBooleanQuery.java 67.10% 19 Missing and 6 partials ⚠️
...search/approximate/ApproximatePointRangeQuery.java 73.68% 10 Missing and 5 partials ⚠️
...arch/search/approximate/ApproximateScoreQuery.java 16.66% 8 Missing and 2 partials ⚠️
...a/org/opensearch/index/query/BoolQueryBuilder.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               main   #19046    +/-   ##
==========================================
  Coverage     72.86%   72.87%            
- Complexity    69411    69505    +94     
==========================================
  Files          5647     5649     +2     
  Lines        319166   319430   +264     
  Branches      46165    46229    +64     
==========================================
+ Hits         232565   232780   +215     
- Misses        67779    67800    +21     
- Partials      18822    18850    +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Copy link
Contributor

❌ Gradle check result for a3fa16e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sawansri
Copy link
Contributor Author

❌ Gradle check result for a3fa16e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky test: #17937

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Copy link
Contributor

✅ Gradle check result for 753f3a8: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[Approximation Framework] Multifold Improvement in Multi-Clause Boolean Query (Window Scoring Approach)
1 participant