I am trying to understand the code from pybasicbayes, which defines the following function to sample from an inverse Wishart:
def sample_invwishart(S,nu): n = S.shape[0] chol = np.linalg.cholesky(S) x = np.random.randn(nu,n) R = np.linalg.qr(x,'r') T = scipy.linalg.solve_triangular(R.T,chol.T,lower=True).T return T @ T.T I am trying to understand how this works, i.e. why does it actually return random variables with the desired distribution.
Let's say I was trying to sample a Wishart distribution (i.e. the inverse of what is being sampled here). I would do the following. I would let chol_inv be the inverse of the cholesky decomposition of $S$, which is $n \times n$. Then, I would consider chol_inv @ x.t, would would give me an $n \times \nu$ matrix where each column is an idd draw from a mean-zero multivariate Gaussian in $n$ dimensions with covariance $S^{-1}$. So, finally, to sample the Wishart distribution, I would return chol_inv @ x.T @ x @ chol_inv.T, which would return the desired $n \times n$ matrix.
So, I know that one approach that would give us a sample from an inverse Wishart would be to return np.linalg.inv(chol_inv @ x.T @ x @ chol_inv.T), but I can see why this approach may not be desirable, because it would involve taking an inverse twice.
As I understand it, here is what the code I'm trying to understand is doing in math.
We consider the matrix $T$, which is an $n \times \nu$ matrix that satisfies the equation T @ R = chol. We know that $R$ is a $\nu \times n$ matrix that satisfies $Q R = X$, where $Q$ is a $\nu \times \nu$ orthogonal matrix and $R$ is a $\nu \times n$ upper triangular matrix.
Now, we could instead consider $T' := T Q.T$ because T' @ T'.T = T @ Q.T @ Q @ T.T = T @ T.T, i.e. using $T'$ instead of $T$ would result in the same return. This transformation is meaningful because chol = T @ R = T @ Q.T @ Q @ R = T' @ x.
If we are in the setting where $\nu = n$ and so x is square and invertible, the result holds immediately. But what if we are in the setting where $\nu \geq n$? I guess this holds using pseudo-inverses?
But still, I don't quite understand
- why is this approach more computationally efficient
- how would anyone of thought of this
- is this a general trick? like could similar approaches be used in other settings?
$n \times v$$\endgroup$