By construction, `M` is real symmetric and almost certainly positive definite.
The eigenvalues of `M` are `e = (d/d[[1]])^2`, where `d` is the vector of singular values of `A`.
The eigenvalues of `C = M^2 + alpha M + beta` are `e^2 + alpha e + beta`.
The eigenvectors `M` and `C` are the same as the left singular vectors of `A`, say `V`.
The offdiagonals of `V'CV` are zero.
The sum of squares of the diagonals of `V'CV` = the sum of squares of all the elements of `C`.
But the diagonals of `V'CV` are just the eigenvalues of `C`, so we can simplify the problem:
minimize `Tr[(e^2 + alpha e + beta)^2]`.
You already know `e`, because you constructed it from `d`, so the solution is immediate:
`{alpha = Covariance[e,-e^2]/Variance[e], beta = Mean[-e^2] - alpha*Mean[e]}`.
That's the exact least-squares solution.
EDIT - It occurred to me that, although getting the singular values of a 1000 x 1000 matrix takes only a few seconds, if you have to do it many times (e.g., inside a loop) then the total time may be prohibitive, so further analysis may help.
Simulations suggest that if the elements of `A` are iid Uniform[-1,1] then the expected value of `e` is close to `Reverse[(#/2 + ArcSin[#]/Pi)^2 & [Range@n/n]]`. This code
{n = #, y = -(x = (#/2 + ArcSin[#]/Pi)^2 & [Range@n/n])^2;
N@{a = (n x.y - (Tr@x)(Tr@y))/(n x.x - (Tr@x)^2),
b = (Tr@y - a*Tr@x)/n}}& /@ {10,100,1000}
gives `{alpha,beta}` pairs for `n = {10, 100, 1000}`.
{{10, {-0.939996, 0.104015}}
{100, {-0.764133, 0.0667884}}
{1000, {-0.743093, 0.0626846}}}
The pair for `n = 1000` should be a good approximation to the actual least-squares coefficients for a sample 1000 x 1000 matrix.
EDIT 2 - I get the same results when the elements of `A` are normal and logistic as when they're uniform. Has taking n = 1000 put us in asymptoticland? **Would someone please check these results?**