-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Multifold Improvement in Multi-Clause Boolean Query (Window Scoring Approach) #19046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ec8efed
to
530c12c
Compare
❌ Gradle check result for 530c12c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 059c177: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
1caffb6
to
ecf2a08
Compare
❌ Gradle check result for ecf2a08: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for b368016: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 1b603be: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Implemented canApproximate and rewrite for single clause bool queries in ApproximateBooleanQuery Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Added ApproximateScoreQuery wrapping in BoolQueryBuilder Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Enabled approximation in single clause bool queries by calling setContext in ApproximateBooleanQuery Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Removed redundancy by adding pattern matching to ApproximateScoreQuery check Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> setContext in canApproximate to remove redundant context variable Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> fix rewrite method to default to original query for multi-clause case Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Prevent multi-clause bool queries from using ApproximateBooleanQuery (for now) Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Fix failing tests for single clause boolean queries Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> fix nested single clause boolean queries Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Enabled proper recursive rewriting to ensure clauses are properly rewritten Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Unwrap boolean query in setContext Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Removed redundant unwrap methods Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Fixed more integration tests Signed-off-by: Sawan Srivastava <sawan1210@gmail.com> Actually check whether nested query can be approximated Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
… clauses Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #19046 +/- ##
==========================================
Coverage 72.86% 72.87%
- Complexity 69411 69505 +94
==========================================
Files 5647 5649 +2
Lines 319166 319430 +264
Branches 46165 46229 +64
==========================================
+ Hits 232565 232780 +215
- Misses 67779 67800 +21
- Partials 18822 18850 +28 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
❌ Gradle check result for a3fa16e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Flaky test: #17937 |
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Description
Implements window based scoring strategy for multi-fold boolean query speedup
The speedup comes from optimizations at two levels:
Early Termination of
ConjunctionDISI
- once 10k hits have been collected from theConjunctionDISI
,BulkScorer.score()
returnsDISI.NO_MORE_DOCS
(essentially max integer) to signal to theCancellableBulkScorer
to stop calling thescore()
method since scoring/collecting is completed. Previously,DISI.NO_MORE_DOCS
would only be returned when the entireConjunctionDISI
has been exhausted but this optimization early terminates at 10k since any conjunction hits past that are not displayed/sent back to the user (constant score case).Window Scoring Approach - As outlined in the issue, only build clause iterators of window size, run the conjunction once, see if 10k hits have been reached, if not then expand window, collect a larger iterator, and run the conjunction again. This can be further optimized by caching a copy of the previous iterator and utilizing
visit(DocIdSetIterator iterator)
to bulk add these already visited docIDs to the new iterator with larger size. Then continue the BKD Traversal from where the last iterator left off (using BKDState) and build the rest of the iterator. This approach would eliminate the redundant work done by each window (of scoring/collecting docs that have already been traversed by the previous iterator) improving performance further. Additional memory usage is something that would have to be benchmarked.Related Issues
Resolves #19045
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.