Calculating Covariance Matrix in Matlab

Question

I am implementing a PCA algorithm in MATLAB. I see two different approaches to calculating the covariance matrix:

C = sampleMat.' * sampleMat ./ nSamples;

and

C = cov(data);

What is the difference between these two methods?

PS 1: When I use cov(data) is that unnecessary:

meanSample = mean(data,1); data = data - repmat(data, nSamples, 1);

PS 2:

At first approach should I use nSamples or nSamples - 1?

Rody Oldenhuis · Accepted Answer · 2012-12-04 13:26:14Z

In short: cov mainly just adds convenience to the bare formula.

If you type

edit cov

You'll see a lot of stuff, with these lines all the way at the bottom:

xc = bsxfun(@minus,x,sum(x,1)/m); % Remove mean if flag xy = (xc' * xc) / m; else xy = (xc' * xc) / (m-1); % DEFAULT end

which is essentially the same as your first line, save for the subtraction of the column-means.

Read the wiki on sample covariances to see why there is a minus-one in the default path.

Note however that your first line uses normal transpose (.'), whereas the cov-version uses conjugate-transpose ('). This will make the output of cov different in the context of complex-valued data.

Also note that cov is a function call to a non-built in function. That means that there will be a (possibly severe) performance penalty when using cov in a loop; Matlab's JIT compiler cannot accelerate non-built in functions.

With the caveat that complex numbers are handled differently from the code in the question.
According to your edit 2, does it better to use first line? and which one is the correct one or are they same to use conjugate-transpose and transpose to calculate covariance?
@kamaci: it depends. If you need to calculate only 1 covariance matrix per run, it's just easier to use cov. If you need to do it hundreds of times in a loop, with different data sets, etc., using the bare formula will be much faster and is overall the better trade-off. As mentioned above: the output of cov will only be different from your first attempt, if your data is complex-valued. If it only contains real values, the outputs will be identical.
I will run it only once however my data is too big, so still using cov is OK?

Collectives™ on Stack Overflow

Calculating Covariance Matrix in Matlab

1 Answer 1

15 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

15 Comments

Related