I’m working on the planted perfect matching problem in random $k$-uniform hypergraphs $k \ge 3$, and I’m stuck on rigorizing the impossibility (lower bound) side of what looks like the information-theoretic threshold. I’d be very grateful for references or structural ideas.
Model
Let $k \ge 3$ and $n \in \mathbb{N}$. Consider a $k$-uniform hypergraph $H$ on $N = kn$ vertices, either:
- $k$-partite with parts of size $n$, or
- a non-partite $k$-uniform model (both versions behave similarly for my purposes).
We generate $H$ by:
Planted perfect matching $M^*$: a fixed perfect matching of size $n$ (partitioning the vertices).
Noise edges $U$: we either
- include each potential $k$-tuple as a noise edge independently with probability
$$ p = \frac{d}{n^{k-1}} $$ (so $\mathbb{E}|U| \approx dn$), or - equivalently, condition on $|U| = m = dn$.
- include each potential $k$-tuple as a noise edge independently with probability
We only observe the unlabeled hypergraph $$ H = M^* \cup U, $$ and the inference problem is to recover the planted matching $M^*$.
What is already known / what I have
For a fixed $\alpha \in (0,1)$, call a perfect matching $M$ a $t$-swap matching (or “at distance $t$ from $M^*$”) if $$ |M \triangle M^*| = 2t, $$ i.e., $M$ uses exactly $t = \alpha n$ noise edges and $n - t$ planted edges.
Using a first-moment / union-bound argument over all $t$-swap matchings (for linear $t$), I compute an explicit rate function and get a threshold of the form $$ d^*(k) \approx e^{k-1}-1 \quad (k \to \infty), $$ such that (roughly):
- If $d < d^*(k)$, then for each fixed $\alpha$, the expected number of $t$-swap matchings is $\exp\{-c(\alpha,k)\,n\}$; summing over $t$ shows that with high probability $M^*$ is the unique perfect matching. This is the achievability (upper) bound.
I am now trying to prove the converse:
If $d > d^*(k)$, then exact recovery of $M^*$ is impossible, because there are exponentially many alternative perfect matchings indistinguishable from $M^*$.
Approach: Kahn-style reverse deletion and the obstruction
I’m trying to adapt Jeff Kahn’s reverse-deletion martingale method from his proof of Shamir’s problem.
Start with the full noise set $U$ and delete noise edges in a uniform random order: $$ e_1, e_2, \dots, e_{|U|}. $$
Let $$ H_i = M^* \cup (U \setminus \{e_1,\dots,e_i\}) $$ be the hypergraph after $i$ deletions.
Fix a deviation $t = \alpha n$. Let $\mathcal{F}_t(H_i)$ be the set of $t$-swap perfect matchings in $H_i$, and define $$ A_i := |\mathcal{F}_t(H_i)|. $$
When we delete $e_i$, a fraction of the matchings are killed: $$ A_i = A_{i-1}(1 - \xi_i), $$ where $\xi_i$ is the “heaviness” of edge $e_i$: the fraction of $t$-swap matchings that contain $e_i$.
Conditionally, one can compute $$ \gamma_i := \mathbb{E}[\xi_i \mid H_{i-1}] = \frac{t}{m_i}, $$ where $m_i = |U \cap E(H_{i-1})|$. Summing $\gamma_i$ over a certain “log–linear” window $m \in [dn, \theta n \log n]$, one exactly recovers the exponent from the first-moment threshold $d^*(k)$. So if we could replace $\log(1-\xi_i)$ by $-\xi_i$, we would match the upper bound.
The problem is controlling the second-order loss $$ Z := \sum_i \big(\log(1-\xi_i) + \xi_i\big) \le 0, $$ which is essentially dictated by the sum of squares $\sum_i \xi_i^2$.
Define $$ D_m := \log(1 - \bar{p}_m) - \frac{1}{m}\sum_e \log(1 - p_e), $$ where:
- $p_e$ is the inclusion probability of edge $e$ under the uniform measure on $\mathcal{F}_t(H_m)$, and
- $\bar{p}_m = t/m$.
Then one has $$ \mathbb{E}[\log(1-\xi_i)\mid H_m] = \log(1 - \bar{p}_m) - D_m, $$ so $$ Z = -\sum_m D_m. $$
Analytically, using a Jensen / Taylor argument for $f(x) = \log(1-x)$, one can show $$ D_m \;\lesssim\; \operatorname{Var}_m(p_e), $$ where the variance is over a uniform random edge $e$ in $H_m$.
Thus, what I really need is a bound of the form $$ \sum_m D_m \;\le\; K_k\,n $$ for some $K_k = K(k)$ (ideally decaying like $1/d$ as $d$ increases). A sufficient condition would be something like $$ \operatorname{Var}_m(p_e) \ll \bar{p}_m^2 $$ uniformly in $m$ in the $[dn, \theta n \log n]$ window, or, more concretely, a “no heavy edges” statement: no edge is contained in more than a constant multiple of the typical fraction $\bar{p}_m = t/m$ of $t$-swap matchings.
This is precisely where I get stuck: I don’t know how to control the edge marginals $p_e$ in this conditional planted ensemble.
I’ve been working on this for a while; another professor recently told me the Kahn-style martingale approach is probably doomed in this sparse regime, but I’d really like to understand whether there is already something in the literature that rules it out or, conversely, a technique (SSM, cluster expansion, switching, …) that could give the needed variance bounds. As a sanity check, I’ve simulated the $k=3$ case with $n$ from $[50,100]$ using a SAT solver to test existence of alternate perfect matchings; empirically the threshold where the probability of an alternate matching jumps from $\sim 0.5$ to $\sim 1$ matches the $d^*(3)$ I computed, but of course that’s not a proof.
Cross posted on Math Overflow