Difference between Bernoulli random variables

Question

Given are $n$ independent Bernoulli random variables with parameters $p_1,\dots,p_n$. We want to split them into two parts so as to minimize the expectation $\mathbb{E}[|X-Y|]$, where $X$ is the sum of the first part and $Y$ the sum of the second part.

What is the best way to split so that this expectation is minimized? A reasonable guess is that it is always to make the sums of the parameters $p_i$'s in the two parts as close as possible. Are there examples where this is not optimal?

Have you made use of the facts and references mentioned here, here, and on the statistics site? — Lee David Chung Lin
– Lee David Chung Lin, Commented Apr 16, 2019 at 21:33
Sure, both $X$ and $Y$ are Poisson binomial distributions, but it's not clear how that helps with dealing with $\mathbb{E}[|X-Y|]$. — pi66
– pi66, Commented Apr 17, 2019 at 0:37
Of course it helps. You can do the analysis built on known results, instead of starting from scratch. In addition, there are many other things mentioned in those posts besides just the name. — Lee David Chung Lin
– Lee David Chung Lin, Commented Apr 17, 2019 at 0:38
Which known results? From your links I don't see much more helpful info than that $X,Y$ are Poisson binomial distributions. — pi66
– pi66, Commented Apr 17, 2019 at 0:41
@antkam Indeed I personally think the exact solution is difficult (or "intractable" as you say). In two comments that are now both deleted, I suggested asymptotic analysis and numerical methods (which are mentioned in the posts I linked), and the OP said such approaches are not of interest. — Lee David Chung Lin
– Lee David Chung Lin, Commented Apr 18, 2019 at 16:19

antkam · Accepted Answer · 2019-04-21 17:47:10Z

UPDATE: There is a counter-example!

The set is $[p_i] = [0.24, 0.24, 0.24, 0.28, 0.5, 0.5],$ and I considered two splits. The first one is a perfect split ($E[X-Y] = 0$) while the second one isn't. If my code has no bugs, here are the relevant values, with minima in red.

$$ \begin{matrix} \text{Split} & E[|X-Y|^2] & E[|X-Y|] & E[X-Y] \\ [0.24, 0.24, 0.24, 0.28] \text{ vs } [0.5, 0.5] & \color{red}{1.2488} &0.84330 & \color{red}{0} \\ [0.24, 0.24, 0.5] \text{ vs } [0.24, 0.28, 0.5] & 1.2504 & \color{red}{0.84003} & 0.04 \end{matrix} $$

So the perfect split ($E[X-Y] = 0$) does minimize $E[|X-Y|^2]$ as my older answer proved. However, the imperfect split actually has a lower $E[|X-Y|]$.

Speculation: It so happens that $P(X-Y=0)$ is higher for the imperfect split ($0.3476$) than for the perfect split ($0.3429$). I wonder if this has any bearing on the question. Also, I wonder if some kind of "symmetry" plays a role, although I did not bother to calculate the skews.

P.S. I found this by considering $[0.25, 0.25, 0.25, 0.25, 0.5, 0.5]$ and evaluating the two different perfect splits. As my older answer shows, both perfect splits have the same $E[|X-Y|^2]$ value. But as I hoped, they have different $E[|X-Y|]$ values. Then I simply perturb them a bit to arrive at the counter-example.

ORIGINAL: Not an answer, but too long for a comment.

I don't know how to minimize $E[|X-Y|]$, but your intuition (split the $p_i$'s as evenly as possible) actually minimizes $E[|X-Y|^2]$. Note that the two are not the same thing in general.

$E[|X-Y|^2] = E[(X-Y)^2] = E[X-Y]^2 + Var(X-Y)$
$E[X-Y] = \sum_{i \in \mathcal{X}} p_i - \sum_{j \in \mathcal{Y}} p_i,$ so $E[X-Y]^2$ is minimized when the $p_i$'s are as evenly split as possible. (Note that this is conceptually simple but computationally... NP-complete I think.)
Meanwhile, $Var(X - Y) = Var(X) + Var(Y) = Var(X+Y) = \sum_{all\ i} p_i (1-p_i) =$ constant. So this does not figure in the minimization.

In conclusion: splitting them as evenly as possible minimizes $E[|X-Y|^2]$.

Further thoughts: Since $E[|X-Y|^2] = E[|X-Y|]^2 + Var(|X-Y|),$ any counter-example would be such that the optimal (uneven) split has higher $E[|X-Y|^2]$, higher $Var(|X-Y|)$, but lower $E[|X-Y|]$, compared to the even split.

I found such a situation rather improbable (pardon the pun), but I don't have a proof either way.

Stack Exchange Network

Difference between Bernoulli random variables

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Difference between Bernoulli random variables

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions