Skip to content

Remove usage of IndexSearcher#search(Query, Collector) from join package#13747

Open
msfroh wants to merge 3 commits intoapache:mainfrom
msfroh:join_collectormanager
Open

Remove usage of IndexSearcher#search(Query, Collector) from join package#13747
msfroh wants to merge 3 commits intoapache:mainfrom
msfroh:join_collectormanager

Conversation

@msfroh
Copy link
Contributor

@msfroh msfroh commented Sep 9, 2024

Description

Relates to #12892

For global ordinal-based join, we can support concurrent search. For numeric and term-based joins, we fail if we're called from a multithreaded searcher.

I can implement concurrent versions of the other join collectors, but wanted to get a first pass that removes the uses of IndexSearcher#search(Query, Collector).

Note that for the cases that still assume a single-threaded searcher, I used a version of CollectorManager#wrap(Collector) from #13735, with my guess for where it will end up based on feedback so far.

Copy link
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good approach. Let's prioritize removing those leftover usages and add the missing collector managers as a next step.

@javanna
Copy link
Contributor

javanna commented Sep 10, 2024

@msfroh thanks a lot for picking this up!!!

@msfroh msfroh force-pushed the join_collectormanager branch from 048eb29 to 66457ff Compare September 10, 2024 23:09
@msfroh
Copy link
Contributor Author

msfroh commented Sep 11, 2024

@javanna -- I've managed to get the remaining numeric / terms collectors in the Join module working with multiple search threads.

I can add them to this PR, but the diff is pretty massive. I'm thinking of holding off for another PR, but I'm happy to go either way.

There is probably value in "atomically" jumping from the current "always single-threaded" mode straight to "everything works with a multithreaded searcher", versus this PR's current state where global ordinal-based joins work with a multithreaded searcher but numeric/term-based joins don't.

Thanks a lot for the review!

Copy link
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple more comments, I am fine getting this in and focusing on the missing parallel collector managers as a follow-up. Thanks for the work you put into this!

For global ordinal-based join, we can support concurrent search. For others, we fail if we're called from a multithreaded searcher.
Since I plan to implement numeric and term collectors that support merging all collectors into a single collector, it makes sense to move MergeableCollectorManager into its own top-level class.
This change removes MergeableCollector (and MergeableCollectorManager), wraps all of the custom Collector classes in their own CollectorManager, and removes all remaining occurrences of CollectorManager.forSequentialExecution from the tests. This also adds all of the other join collectors, bringing the join module fully into the CollectorManager club.
@msfroh msfroh force-pushed the join_collectormanager branch from 66457ff to 7ff4804 Compare September 20, 2024 02:11
@msfroh
Copy link
Contributor Author

msfroh commented Sep 20, 2024

Okay -- I wrapped all of the Collectors in CollectorManagers, and managed to remove all uses of CollectorManager.forSequentialExecution. I also went ahead and added the remaining Collectors to this PR.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Oct 5, 2024
@msfroh
Copy link
Contributor Author

msfroh commented Jan 10, 2025

@javanna -- obviously we missed the Lucene 10 cutoff for this, but it doesn't break any public APIs. Do you think it makes sense to merge this as an incremental improvement?

@github-actions github-actions bot removed the Stale label Jan 11, 2025
@javanna
Copy link
Contributor

javanna commented Jan 20, 2025

Hey @msfroh I need to review this once again and resume context on it. I will try to get to it this week.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Feb 4, 2025
@github-actions github-actions bot removed the Stale label Oct 1, 2025
@github-actions
Copy link
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 participants