My problem
Given a partially-ordered set $(S, <)$, I want to compute the set of maximal elements $$ S_{max} =\{a\in S | \nexists b \in S, a < b \} $$
while making as few comparisons as possible between elements.
A couple of examples:
- $S=\{1, 2, ... P\}$ with the usual comparison operator. Then $S_{max}=\{P\}$. This example is somewhat trivial because any set with a strict total order, as is the case here, has a unique maximal element that can be determined in linear time.
- $S=\{(x,y)| x,y\in\{1, 2, ... P\}\}$ with the partial ordering $(x,y)<(x',y') \iff x<x' \text{ and } y<y'$ (in English: the set of all pairs of integers between $1$ and $P$, where a pair is strictly smaller than another if both values are pairwise strictly smaller). Then the maximal set is $S_{max}=\{(x,y)\in S|x=P \text{ or } y=P\}$ (in English: the set of pairs with at least one value equal to the maximum value; such pairs are not strictly smaller than any other element, and any other pair is strictly smaller than $(P, P)$).
My real-world use case is in fact the second example, but with $S$ a subset of all possible pairs of integers between $1$ and $P$ (rather than all such pairs). (Feel free to answer either specifically for that case, or more generally.)
Semi-obvious complexity bounds
Assume the set has $N$ elements.
No algorithm can have a best case (and thus an average case) better than $O(N)$: any algorithm, on any input data, must make at least $O(N)$ comparisons, because every element of the set must be compared at least once to another element.
Any algorithm has a worst case in $O(N^2)$: Consider input data with an empty ordering (i.e., there are no elements such that $a<b$). Then, $S_{max}=S$; but to know this, the algorithm needs to compute every pairwise comparison (and see that it fails), which is $O(N^2)$ comparisons.
There is an algorithm that guarantees $O(N^2)$ for any input data: the brute-force algorithm is as follows:
candidate_set := $S$ for all $a,b \in S$: if a<b: remove a from candidate_set if b<a: remove b from candidate_set return candidate_set In fact, one could even produce a topological ordering of the set in $O(N^2)$ (see Kahn’s algorithm).
Is there a quicker way?
I "only" want the maximal elements, not a full topological ordering, so there might be a quicker way to proceed than $O(N^2)$. Every comparison that yields a true value allows us to eliminate an element from further consideration; therefore, if there is a heuristic to find comparisons that are likely to be true, starting by computing those comparisons first allows us to cut the search space quicker.
For instance, for my use case (see above), I feel the following pseudocode gives a good start in many conditions:
candidate_set := $S$ comparison_targets_set = $CTS$ := $\emptyset$ possible_first_values := $\{x|(x,y)\in S\}$ for $x_0$ in possible_first_values: associated_y_values := $\{y|(x_0,y)\in S\}$ add $(x_0$, max(associated_y_values))$ to $CTS$ # do the same thing by bucketing on second values rather than first values # then run the brute-force algorithm, but only for comparisons where the right hand side is in $CTS$ The preprocessing steps are all in $O(N)$. Applying the brute-force algorithm afterwards is $O(N\times card(CTS))$ which might be significantly smaller than $O(N^2)$ in practice. I suspect that for uniform I.I.D. random pairs of integers, $card(CTS)\approx \sqrt{N}$: basically, the points are a random blob in 2D, $CTS$ is roughly a subset of the convex envelope, and a random blob of $N$ points has about $\sqrt{N}$ points on its convex envelope. Needless to say, this is not a rigorous proof, but if it is true, then the pseudocode above runs in $O(N^{3/2}$) average time.