Remove usage of IndexSearcher#search(Query, Collector) from join package#13747
Remove usage of IndexSearcher#search(Query, Collector) from join package#13747msfroh wants to merge 3 commits intoapache:mainfrom
Conversation
javanna left a comment
There was a problem hiding this comment.
I think it's a good approach. Let's prioritize removing those leftover usages and add the missing collector managers as a next step.
lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.java Outdated Show resolved Hide resolved
| @msfroh thanks a lot for picking this up!!! |
048eb29 to 66457ff Compare | @javanna -- I've managed to get the remaining numeric / terms collectors in the Join module working with multiple search threads. I can add them to this PR, but the diff is pretty massive. I'm thinking of holding off for another PR, but I'm happy to go either way. There is probably value in "atomically" jumping from the current "always single-threaded" mode straight to "everything works with a multithreaded searcher", versus this PR's current state where global ordinal-based joins work with a multithreaded searcher but numeric/term-based joins don't. Thanks a lot for the review! |
javanna left a comment
There was a problem hiding this comment.
I left a couple more comments, I am fine getting this in and focusing on the missing parallel collector managers as a follow-up. Thanks for the work you put into this!
lucene/join/src/java/org/apache/lucene/search/join/MergeableCollector.java Show resolved Hide resolved
For global ordinal-based join, we can support concurrent search. For others, we fail if we're called from a multithreaded searcher.
Since I plan to implement numeric and term collectors that support merging all collectors into a single collector, it makes sense to move MergeableCollectorManager into its own top-level class.
This change removes MergeableCollector (and MergeableCollectorManager), wraps all of the custom Collector classes in their own CollectorManager, and removes all remaining occurrences of CollectorManager.forSequentialExecution from the tests. This also adds all of the other join collectors, bringing the join module fully into the CollectorManager club.
66457ff to 7ff4804 Compare | Okay -- I wrapped all of the |
| This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
| @javanna -- obviously we missed the Lucene 10 cutoff for this, but it doesn't break any public APIs. Do you think it makes sense to merge this as an incremental improvement? |
| Hey @msfroh I need to review this once again and resume context on it. I will try to get to it this week. |
| This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
| This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
Description
Relates to #12892
For global ordinal-based join, we can support concurrent search. For numeric and term-based joins, we fail if we're called from a multithreaded searcher.
I can implement concurrent versions of the other join collectors, but wanted to get a first pass that removes the uses of
IndexSearcher#search(Query, Collector).Note that for the cases that still assume a single-threaded searcher, I used a version of
CollectorManager#wrap(Collector)from #13735, with my guess for where it will end up based on feedback so far.