8
$\begingroup$

Let the square nonsingular matrix $M$ is a given convergent matrix. What are the best scalar values for $\alpha$ and $\beta$ (in the real numbers domain), at which the following quadratic matrix equation approximately holds:

\begin{equation} M^2+\alpha M+\beta I\approx0. \end{equation}

In fact, I would like to find the scalar values as closely as possible for the following matrix $M$ (as an example)

Clear["Global`*"] SeedRandom[12345] n = 1000; Id = SparseArray[{i_, i_} -> 1., {n, n}, 0.]; A = RandomReal[{-1, 1}, {n, n}]; X = 1/SingularValueList[A, 1][[1]]^2 ConjugateTranspose[A]; M = A.X; 

Also I tried to use NMinimize or FindInstance, but I failed. Please note that a rapid approach is more better for me to find more accurate values for the scalar paramteres.

$\endgroup$
2
  • $\begingroup$ What specifically are you trying to minimize? The sum of squares of elements of M^2+alpha*M+beta*I (Frobenius norm)? Another norm? $\endgroup$ Commented Aug 20, 2013 at 21:57
  • $\begingroup$ The applied norm is optional (Frobenius or Infinity). Just finding some approximate valaues for the scalars in any fast way is needed. $\endgroup$ Commented Aug 21, 2013 at 6:40

2 Answers 2

8
$\begingroup$

One method is to use Mathematica's PseudoInverse[] on the matrix equation that involves the eigenvalues of $M$. To arrive at this matrix equation, multiply each side of $M^2 + \alpha M + \beta I \approx 0$ by an eigenvector $v_1$ of $M$, to get

$$ (\lambda_1^2 + \alpha \lambda_1 + \beta) v_1 \approx 0 \implies \lambda_1^2 + \alpha \lambda_1 + \beta\approx 0,$$ where $\lambda_1$ is the corresponding eigenvalue for $v_1$. Repeat this for all three eigenvectors (assuming that $M$ is semi-simple) to arrive at the following approximate matrix equation

$$\begin{pmatrix}\lambda_1 & 1 \\ \lambda_2 & 1 \\ \lambda_3 & 1 \end{pmatrix} \begin{pmatrix} \alpha \\ \beta\end{pmatrix} \approx - \begin{pmatrix} \lambda_1^2 \\ \lambda_2^2 \\ \lambda_3^2 \end{pmatrix}. $$ To solve this use in Mathematica

{α,β} = PseudoInverse[{{λ1,1},{λ2,1},{λ3,1}}].{λ1^2,λ2^2,λ3^2} 

Then all you need are the eigenvalues of $M$, which you can get through

{λ1,λ2,λ3} = Eigenvalues[M] 

For a very large matrix it maybe impractical to find all of $M$'s eigenvalues, in which case you can use the Power Method to obtain, for example, the three largest eigenvalues and then use the above method to find α and β. The more eigenvalues you use the better the approximation.

Hope this helps.

$\endgroup$
3
  • $\begingroup$ If the OP doesn't have a version with PseudoInverse, then he's not using Mathematica as it was there in v1. But, it was last modified in v5, so at a minimum he should use that version. $\endgroup$ Commented Aug 21, 2013 at 14:06
  • $\begingroup$ Running the above with M = DiagonalMatrix[{1,2,3}], I get {4, -(10/3)}. But, plugging this into M.M + α M + β IdentityMatrix[3] gives DiagonalMatrix[{5/3, 26/3, 53/3}]. So, I don't think your method is effective. Although, no solution may exist for the matrix I chose. Anyway to address that? $\endgroup$ Commented Aug 21, 2013 at 14:19
  • $\begingroup$ Hi there, my result for this calculation gave {{1/3, 0, 0}, {0, -(2/3), 0}, {0, 0, 1/3}}, which is not entirely terrible. In any case, the bigger the spread of the eigenvalues, the worse the approximation. If M has rank 100 with "strongly" different eigenvalues, then game over, no good approximation exists. On the other hand if you use a rank 2 matrix, such as M = Outer[Times, z, w]+Outer[Times, x, y] , for any lists (vectors) z, w, x and y then the solution will be exact. @rcollyer, thanks for your comments, I've edited my post accordingly. $\endgroup$ Commented Aug 21, 2013 at 15:14
5
$\begingroup$

By construction, M is real symmetric and almost certainly positive definite.
The eigenvalues of M are e = (d/d[[1]])^2, where d is the vector of singular values of A.
The eigenvalues of C = M^2 + alpha M + beta are e^2 + alpha e + beta.
The eigenvectors M and C are the same as the left singular vectors of A, say V.
The offdiagonals of V'CV are zero.
The sum of squares of the diagonals of V'CV = the sum of squares of all the elements of C.
But the diagonals of V'CV are just the eigenvalues of C, so we can simplify the problem:
minimize Tr[(e^2 + alpha e + beta)^2].
You already know e, because you constructed it from d, so the solution is immediate:
{alpha = Covariance[e,-e^2]/Variance[e], beta = Mean[-e^2] - alpha*Mean[e]}.
That's the exact least-squares solution.

EDIT - It occurred to me that, although getting the singular values of a 1000 x 1000 matrix takes only a few seconds, if you have to do it many times (e.g., inside a loop) then the total time may be prohibitive, so further analysis may help.

Simulations suggest that if the elements of A are iid Uniform[-1,1] then the expected value of e is close to Reverse[(#/2 + ArcSin[#]/Pi)^2 & [Range@n/n]]. This code

{n = #, y = -(x = (#/2 + ArcSin[#]/Pi)^2 & [Range@n/n])^2; N@{a = (n x.y - (Tr@x)(Tr@y))/(n x.x - (Tr@x)^2), b = (Tr@y - a*Tr@x)/n}}& /@ {10,100,1000} 

gives {alpha,beta} pairs for n = {10, 100, 1000}.

{{10, {-0.939996, 0.104015}} {100, {-0.764133, 0.0667884}} {1000, {-0.743093, 0.0626846}}} 

The pair for n = 1000 should be a good approximation to the actual least-squares coefficients for a sample 1000 x 1000 matrix.

EDIT 2 - I get the same results when the elements of A are normal and logistic as when they're uniform. Has taking n = 1000 put us in asymptoticland? Would someone please check these results?

EDIT 3 - I think what we're seeing is a result of the quarter-circle law that the asymptotic density of the singular values of an n x n matrix whose elements are iid zero-mean random variables with variance 1/n converges to Sqrt[4 - x^2]/Pi, 0 <= x <= 2. The singular values we are dealing with have been normalized -- i.e., divided by the largest one -- so (rounding some statistical corners) their asymptotic density is approximately proportional to Sqrt[1 - x^2], 0 <= x <= 1, and their cumulative distribution is approximately F[x] := 2(x Sqrt[1 - x^2] + ArcSin[x])/Pi. The function I gave in my first edit, say K[p] := p/2 + ArcSin[p]/Pi, is an approximate inverse of F[x]. The actual inverse has no closed form and must be found numerically, say X[p] := Block[{x},x/.FindRoot[F[x]==p,{x,K[p]}]]. For n = 1000, the rms error in approximating the normalized singular values using X[p] is typically much smaller than the rms error using K[p]; e.g., .00095 vs .0091 .

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.