2
$\begingroup$

I am trying to understand the code from pybasicbayes, which defines the following function to sample from an inverse Wishart:

def sample_invwishart(S,nu): n = S.shape[0] chol = np.linalg.cholesky(S) x = np.random.randn(nu,n) R = np.linalg.qr(x,'r') T = scipy.linalg.solve_triangular(R.T,chol.T,lower=True).T return T @ T.T 

I am trying to understand how this works, i.e. why does it actually return random variables with the desired distribution.

Let's say I was trying to sample a Wishart distribution (i.e. the inverse of what is being sampled here). I would do the following. I would let chol_inv be the inverse of the cholesky decomposition of $S$, which is $n \times n$. Then, I would consider chol_inv @ x.t, would would give me an $n \times \nu$ matrix where each column is an idd draw from a mean-zero multivariate Gaussian in $n$ dimensions with covariance $S^{-1}$. So, finally, to sample the Wishart distribution, I would return chol_inv @ x.T @ x @ chol_inv.T, which would return the desired $n \times n$ matrix.

So, I know that one approach that would give us a sample from an inverse Wishart would be to return np.linalg.inv(chol_inv @ x.T @ x @ chol_inv.T), but I can see why this approach may not be desirable, because it would involve taking an inverse twice.

As I understand it, here is what the code I'm trying to understand is doing in math.

We consider the matrix $T$, which is an $n \times \nu$ matrix that satisfies the equation T @ R = chol. We know that $R$ is a $\nu \times n$ matrix that satisfies $Q R = X$, where $Q$ is a $\nu \times \nu$ orthogonal matrix and $R$ is a $\nu \times n$ upper triangular matrix.

Now, we could instead consider $T' := T Q.T$ because T' @ T'.T = T @ Q.T @ Q @ T.T = T @ T.T, i.e. using $T'$ instead of $T$ would result in the same return. This transformation is meaningful because chol = T @ R = T @ Q.T @ Q @ R = T' @ x.

If we are in the setting where $\nu = n$ and so x is square and invertible, the result holds immediately. But what if we are in the setting where $\nu \geq n$? I guess this holds using pseudo-inverses?

But still, I don't quite understand

  • why is this approach more computationally efficient
  • how would anyone of thought of this
  • is this a general trick? like could similar approaches be used in other settings?
$\endgroup$
1
  • 2
    $\begingroup$ If $nxν$ is supposed to be $n \times v$ then in LaTeX you can use $n \times v$ $\endgroup$ Commented Sep 10, 2023 at 20:39

1 Answer 1

2
$\begingroup$

There's not a huge difference in computational complexity. Your approach has five $\Omega(n^3)$ operations (three multiplications and two inverses) and the code has four (computing chol, R, T and the return value), but the QR decomposition is 2-3 times slower than any of the others. For large $\nu$, both the QR decomposition and the formation of $X^TX$ will be $\Omega(\nu n^2)$. It's going to be within a factor of two or so in operation count, and for large $n$ or $\nu$ the actual times will be dominated by getting numbers on and off the chip.

The code is a bit more computationally stable, because it doesn't have to form $X^TX$, which is the place where the most potential rounding happens. That's probably not a big issue with modern floating point, but it might matter if you're doing single precision on a GPU or something.

The reason you'd think of this is that computational linear algebra folk (at least in statistics) think of everything in terms of triangular decompositions rather than inverses -- partly because the increased stability used to matter a lot. And yes, it's a general trick: rather than forming and inverting $X^TX$ you can pretty often work with a triangular decomposition of $X$. That's how linear regression is often done: work with a QR decomposition of $X$ and solve the triangular linear system $R=Q^TY$ rather than inverting $X^TX$.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.