Function of Determinant convexity via information theoretic methods

Question

In Cover and Thomas, "Elements of Information Theory", second edition, I encountered this question:

Let $K$ and $K_0$ be symmetric positive definite matrices of same size (s.p.d.m). Then show that (det is determinant here) $$f(K)=\log\left(\frac{\det(K+K_0)}{\det{(K)}}\right)$$ is a convex function of $K$.

As a side note, one of the best things about information theory is that it can be used to prove unusual determinant identities and inequalities such as Hadamard's inequality and so on. I am only looking for an "information theoretic" proof and not any other proofs. As an example,

Prove that if $K_1$ and $K_2$ are s.p.d.m, then $$\det (K_1 + K_2) \geq \max \{\det(K_1),\det(K_2)\}$$ Proof:

Let $X \sim N(0,K_1)$ and $Y \sim N(0,K_2)$ independent of each other. Then we have that (here $n$ is the dimension of the matrix and $h(.)$ is the differential entropy function) $$h(X+Y) \geq h(X+Y|X) = h(Y)$$ Hence, we have $$\frac{1}{2}\log((2\pi e)^n\det(K_1+K_2)) \geq \frac{1}{2}\log((2\pi e)^n\det(K_2))$$ The result will follow from this.

Here is my attempt at the original problem:

We need to show for $\lambda \in (0,1)$ $$\log\left(\frac{\det(\lambda K_1 + (1-\lambda)K_2 +K_0 )}{\det(\lambda K_1 + (1-\lambda)K_2)}\right) \leq \lambda \log\left(\frac{\det(K_1+K_0)}{\det{(K_1)}}\right) + (1-\lambda)\log\left(\frac{\det(K_2+K_0)}{\det{(K_2)}}\right)$$ Let $X_1 \sim N(0,K_1)$, $X_2 \sim N(0,K_2)$ and $Y\sim N(0,K_0)$ all mutually independent. Let $v$ be a random variable which is $1$ w.p. $\lambda$ and $2$ w.p. $1-\lambda$. Let $K_v =\lambda K_1 + (1-\lambda)K_2$. Note that $E[X_vX_v^T]= K_v$. Let $\widetilde{X}\sim N(0,K_v)$. Note that $X_v$ is not necessarily multivariate gaussian. Hence it boils down to showing $$h(\widetilde{X}+Y) - h(\widetilde{X}) \leq h(X_v+Y|v) - h(X_v|v)$$

I am not sure if the inequality above is correct or not. It's just that it would imply the result. Anyhow I was able to get (since $X_v$ and $\widetilde{X}$ have the same covariance matrix) $$h(\widetilde{X}) \geq h(X_v) \geq h(X_v|v)$$

But I couldn't get a similar chain for $h(\widetilde{X}+Y)$. I tried to use entropy power inequality for this but it didn't seem to work. My guess is that this approach won't work and I should try to manipulate mutual information terms or something.

I appreciate any ideas on this. As always, if something is unclear let me know.

Gautam Shenoy · Accepted Answer · 2017-01-26 14:43:09Z

I found the solution I was looking for in a 2001 paper by Suhas Diggavi and Tom Cover. Didn't realize it was that hard that it warranted a transactions paper. After all, Tom Cover gave it as an exercise problem!!

Update: I have found another way. Kindly check if my proof is correct.

Consider 2 channels $$Y_1 = X + Z_1$$ $$Y_2 = X + Z_2$$ where $X\sim \mathcal{N}(0,K_0)$, $Z_1\sim \mathcal{N}(0,K_1)$ and $Z_2 \sim\mathcal{N}(0,K_2)$. Also $X$, $Z_1$ and $Z_2$ are mutually independent random vectors. Let $U\in \{1,2\}$ be a random variable independent of all the above random variables and such that $Pr(U=1)=\lambda$ and $Pr(U=2)=1-\lambda$. Consider the mixed channel $$Y_U = X+Z_U.$$ If $W_1(y|x)$ and $W_2(y|x)$ are the channel transition probabilities in the first two cases then the channel in the mixed case is $$W(y|x) = \lambda W_1(y|x) + (1-\lambda)W_2(y|x).$$ Since the distribution of $X$ is fixed, from basic information theory, the mutual information between input and output is convex as a function of channel transition probability provided the input distribution is fixed. Hence $$I(X;Y_U) \leq \lambda I(X;Y_1) + (1-\lambda)I(X;Y_2)$$ If you evaluate RHS, you get \begin{align} RHS &= \lambda \left\{\frac{1}{2}\log\left((2\pi e)^n\det(K_1 + K_0)\right)-\frac{1}{2}\log\left((2\pi e)^n\det(K_1)\right) \right\} \\ &+ (1-\lambda)\left\{ \frac{1}{2}\log\left((2\pi e)^n\det(K_2 + K_0)\right)- \frac{1}{2}\log\left((2\pi e)^n\det(K_2)\right)\right\} \\ &=\lambda \left\{\frac{1}{2}\log\left(\frac{\det(K_1 + K_0)}{\det(K_1)}\right)\right\} +(1-\lambda) \left\{\frac{1}{2}\log\left(\frac{\det(K_2 + K_0)}{\det(K_2)}\right)\right\} \end{align} which is the required RHS except for a factor of $2$ which will be cancelled soon. Now consider LHS. We need a suitable lower bound for it. $$I(X;Y_U) = h(X+Z_U) - h(Z_U)$$ Note that $Z_U$ is not multivariate Gaussian. But it has variance $\lambda K_1 + (1-\lambda)K_2$. Now by entropy power inequality (EPI), we have $$2^{\frac{2h(X+Z_U)}{n}}\geq 2^{\frac{2h(X)}{n}} + 2^{\frac{2h(Z_U)}{n}}.$$ This implies $$2^{\frac{2(h(X+Z_U)-h(Z_U))}{n}}\geq 1+ 2^{\frac{2(h(X)-h(Z_U))}{n}} \geq 1+ 2^{\frac{2(h(X)-h(Z))}{n}}.$$ where $Z$ is multivariate normal with covariance matrix $\lambda K_1 + (1-\lambda )K_2$, and we used that the Gaussian distribution maximizes entropy under common variance. Now we also have $$1+ 2^{\frac{2(h(X)-h(Z))}{n}} = 2^{\frac{2(h(X+Z)-h(Z))}{n}}$$ since Gaussian distributions achieve equality in EPI. Hence we get \begin{align} h(X+Z_U)-h(Z_U) &\geq h(X+Z)-h(Z) \\ &=\frac{1}{2}\log\left((2\pi e)^n \det(\lambda K_1 + (1-\lambda )K_2+K_0)\right)\\ &\quad-\frac{1}{2}\log\left((2\pi e)^n \det(\lambda K_1 + (1-\lambda )K_2)\right)\\ &=\frac{1}{2}\log\left(\frac{\det(\lambda K_1 + (1-\lambda )K_2+K_0)}{\det(\lambda K_1 + (1-\lambda )K_2)}\right) \end{align} which gives us the desired result. It's quite likely this was the solution the authors expected, assuming the steps are correct.

Stack Exchange Network

Function of Determinant convexity via information theoretic methods

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Function of Determinant convexity via information theoretic methods

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions