2
$\begingroup$

I am trying to optimize an objective function $L(\theta)$ in which some parameters that I aim to recover belong to a covariance matrix, $\Sigma$. $\Sigma$ has a unique structure, which includes ones on the diagonals: $$ \Sigma = \begin{pmatrix} 1 & & \\ \sigma_{21} & 1 & \\ \sigma_{31} & \sigma_{32} & 1\\ \end{pmatrix} \\ $$

If $\Sigma$ had no such restrictions, one would typically include the lower-triangular Cholesky decomposition, $L$, in $\theta$ and pass it to the objective function, and then recreate $\Sigma = L L'$ inside the objective function. This ensures that, regardless of the values that the optimizer tries, you always have a valid covariance matrix (symmetric and positive semidefinite).

In my case, because $\Sigma$ has ones on the diagonal, only three parameters are identified, which means that I can only pass three parameters from the Cholesky decomposition to my objective function. Let the Cholesky decomposition be written as $$ L = \begin{pmatrix} x_{11} & 0 & 0\\ x_{21} & x_{22} & 0 \\ x_{31} & x_{32} & x_{33} \\ \end{pmatrix} $$

One option is to pass the parameters the off-diagonal parameters of $L$, $x_{21}, x_{31}, x_{32}$, and then set $x_{11} = 1$, $x_{22} = \sqrt{1 - x_{21}^2}$, and $x_{33} = \sqrt{1 - x_{31}^2 - x_{32}^2}$ inside the objective function. If I did this, then $LL' = \Sigma.$ However, this is suboptimal from an optimization standpoint, because $x_{21}, x_{31}, x_{32}$ cannot vary freely over the parameter space, but rather must satisfy a series of nonlinear constraints $(x_{21}^2 < 1, x_{31}^2 + x_{32}^2 < 1)$.

What is the best way of passing three parameters to the likelihood function, such that I can reconstruct $\Sigma$ inside the objective function without running into problems arising from taking the square root of a negative number?

$\endgroup$
6
  • $\begingroup$ Are you using an environment with automatic differentiation (e.g. tensorflow, pytorch, jax) or are you planning on computing gradients manually? $\endgroup$ Commented Feb 1, 2024 at 22:23
  • 1
    $\begingroup$ But yeah, generically, one option would be to parameterize $\mathbf{L}$ as a lower triangular matrix per usual, then map it onto the correlation matrices in two steps. 1) $\tilde{\Sigma}=\mathbf{L}\mathbf{L}^\top$, as usual, followed by 2) $\Sigma = \mathbf{D}\tilde{\Sigma}\mathbf{D}$, where $\mathbf{D}$ is a diagonal matrix with elements given by the inverse square root of the diagonal elements of $\tilde{\Sigma}$. $\endgroup$ Commented Feb 1, 2024 at 22:26
  • $\begingroup$ I will not be using automatic differentiation. I'll either approximate the gradient numerically or compute it analytically. And thank you, your solution works. In the example above, I would pass $x_{21}, x_{31}, x_{32}$ to the objective function and set these equal to the lower triangular elements of L and set the diagonals equal to 1. Then, steps 1) and 2) above will provide a valid covariance matrix with the restrictions I want regardless of the values of $x_{21}, x_{31}, x_{32}$. $\endgroup$ Commented Feb 2, 2024 at 1:44
  • $\begingroup$ cool I ask because it seems like it would be a pain to compute the gradient analytically, but if you do really only have a 3 by 3 matrix numerical gradient computation may suffice. $\endgroup$ Commented Feb 2, 2024 at 14:59
  • 1
    $\begingroup$ since my comment seems to have answered the question I went ahead and made it an actual answer as CV ettiquette demands. $\endgroup$ Commented Feb 2, 2024 at 15:52

1 Answer 1

3
$\begingroup$

We can use parameters organized in the usual manner:

$$ L = \begin{pmatrix} x_{11} & 0 & 0\\ x_{21} & x_{22} & 0 \\ x_{31} & x_{32} & x_{33} \\ \end{pmatrix} $$

And then map this onto the set of correlation matrices in a two step process.

First, mapping it onto the set of covariance matrices, per usual: $$ \tilde{\Sigma} = \mathbf{L}\mathbf{L}^\top $$

Second, mapping this covariance matrix onto the set of correlation matrices by scaling the rows and columns by the inverse square root of the variances. Linear algebraically, this looks like forming a diagonal matrix $\mathbf{D}$: $$ \textrm{diag}(\mathbf{D}) = \Bigg[ \frac{1}{\sqrt{\tilde\Sigma_{1,1}}}, \ldots, \frac{1}{\sqrt{\tilde\Sigma_{N,N}}}\Bigg] $$

and then pre and postmultiplying $\tilde{\Sigma}$ by this matrix:

$$ \Sigma = \mathbf{D}\tilde\Sigma\mathbf{D} $$

When implementing this on a computer, it will be more efficient to simply scale the columns and rows of $\tilde{\Sigma}$, rather than explicitly computing the matrix multiplication.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.