1
$\begingroup$

My professor mentioned that if we already know the divide and conquer algorithm for the closest pair of points in 2 dimensions it's easy to think of a similar algorithm for the 3 dimensions. But I am a bit confused as to what changes we have to make.

I think we'll divide the points in two sets based on the x coordinate, find the closest pair of points in each set (let's say that the minimum distance is δ) but instead of a strip (2δ width), since we have three dimensions we'll use a slab to compare the distance between point that exist in different sets. But I don't know what I have to do next.

$\endgroup$

2 Answers 2

6
$\begingroup$

As you say, you can split the points in two sets $A$ and $B$ of roughly the same size, depending on whether their $x$ coordinate smaller than or at least some threshold $x_0$. Solve the problem recursively on $A$ and $B$ and let $\delta$ be the minimum distance between any two points in the same set.

Now, instead of a strip of width $2\delta$, consider a slab with width $2\delta$ (centered on $x_0$) and with unbounded height and depth (towards the $y$ and $z$ axes).

Notice that a necessary condition for pair of points $p_a \in A$ and $p_b \in B$ to be at distance less than $\delta$ is that both $p_a$ and $p_b$ must belong to the slab.

Imagine subdividing this slab into cubes having side $\delta/2$, so that no cube crosses $x_0$. Pick any point $p$ in the slab and consider the "supercube" $C_p$ of side $2\delta$ centered in $p$. All points outside of $C_p$ are either outside of the slab or are too far away from $p$ even when we just consider the difference of the $y$ or $z$ coordinates. The means that, if there is a point at distance at most $\delta$ from $p$, then it must belong to $C_p$. However $C_p$ intersects at most $5^3$ cubes. This means that for each point we only need to check the points in at most $5^3$ cubes.

How many points are there in these cubes? At most $5^3$. To see this notice that a cube cannot contain more than one point. Indeed, each cube is entirely contained in either $A$ or $B$ and the maximum distance between two points in the same cube is $\frac{\delta}{2} \cdot \sqrt{3} < \delta$.

How do we check these points efficiently? Let's start with an observation: given a collection $S$ of points and some parameter $\delta$ with the guarantee that each points has at most constantly many points within distance $\delta$, the 2D-algorithm that you already know can be used to enumerate all pairs of points at distance at most $\delta$ in time $O(|S| \log |S|)$.

Take all the points in the slab and project them onto the 2d-plane perpendicular to the $x$ axis and passing through $x_0$ (i.e., "squish the slab" along the $x$ axis). We only need to consider the pairs of points whose projections are at distance at most $\delta$. By the above argument, for each point there are there are at most $5^3$ other points within such distance.

But then we can use the 2D-algorithm to get a list of all pairs of points to check! This takes time $O(|S| \log |S|)$ where $|S|$ is the number of points in the slab. Let's be pessimistic and assume that all current points end up in the slab. We have the following recurrence equation, where $n$ denotes the number of points: $$ T(n) = 2T(n/2) + O(n \log n), $$ which has solution $T(n) = O(n \log^2 n)$ as you can see using the master theorem.

It turns out that you can be more clever in the selection of $x_0$. You can pick, in time $O(n)$, a threshold that (i) splits the point in $A$ and $B$ such that $\min\{|A|, |B|\} \ge c n$ for some constant $c>0$, and (ii) ensures that $S$ contains $O(n^{1-\epsilon})$ points for some constant $\epsilon>0$. With this clever selection you get the following recurrence: $$ \begin{align*} T(n) &\le T(cn) + T((1-c)n) + O(n) + O(n^{1-\epsilon} \log n^{1-\epsilon}) \\ & \le T(cn) + T((1-c)n) + O(n). \end{align*} $$ This recurrence has solution $T(n) = O(n \log n)$. To see this you can notice that the recursion tree has depth $O(\log n)$ and that the overall time spent on the recursive calls on each level of the tree is $O(n)$.

See this paper for an explanation of how to select $x_0$.

$\endgroup$
2
  • $\begingroup$ And the complexity here remains $O(nlogn)$? Can you explain to me how? $\endgroup$ Commented Nov 9, 2022 at 6:59
  • $\begingroup$ @Battle. I've added an explanation to the answer. $\endgroup$ Commented Nov 9, 2022 at 9:51
2
$\begingroup$

The 2D approach you know probably uses sorting. Here I describe how to do it without sorting. This approach is straightforward to generalize to higher dimensions.

Below, "$\delta$-close" means "the distance is at most $\delta$"

We divided the points based on the $x$-coordinate, found $\delta$, and now it remains to do the "merge" part. We consider the points that are $\delta$-close to the border. Several observations:

  1. If $(x_1,y_1)$ and $(x_2,y_2)$ are $\delta$-close, then we have $|y_1 - y_2| \le \delta$.
  2. For an integer $k$, let $S_k = \{(x,y) \mid y \in [k, k+2] \text{ and $(x,y)$ is $\delta$-close to the border}\}$. If $(x_1,y_1)$ and $(x_2,y_2)$ are $\delta$-close, then, by Observation 1, there exists an integer $k$ such that both $(x_1,y_1) \in S_k$ and $(x_2,y_2) \in S_k$.
  3. For any $k$, $|S_k|$ is at most constant.

As a result, we have at most $O(n)$ non-empty $S_k$. By Observation 2, it suffices for each non-empty $k$ to try all possible pairs of points from $S_k$, which, by Observation 3, takes constant time.

$\endgroup$
2
  • $\begingroup$ So what is the complexity of your algorithm? Is it linear? $\endgroup$ Commented Nov 9, 2022 at 8:04
  • $\begingroup$ @Battle. There is a lower bound of $\Omega(n \log n)$ for this problem, where $n$ is the number of points. $\endgroup$ Commented Nov 9, 2022 at 8:40

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.