7

I was wondering how can matlab multiply two matrices so fast. When multiplying two NxN matrices, N^3 multiplications are performed. Even with the Strassen Algorithm it takes N^2.8 multiplications, which is still a large number. I was running the following test program:

a = rand(2160); b = rand(2160); tic;a*b;toc 

2160 was used because 2160^3=~10^10 ( a*b should be about 10^10 multiplications)

I got:

Elapsed time is 1.164289 seconds. 

(I'm running on 2.4Ghz notebook and no threading occurs) which mean my computer made ~10^10 operation in a little more than 1 second.

How this could be??

6
  • 4
    Actually, the 'Ma' in Matlab stands for magic. Commented Oct 4, 2012 at 7:04
  • 1
    How do you know no threading occurs? Commented Oct 4, 2012 at 7:06
  • Are you sure it is computed on the CPU? mathworks.com/discovery/matlab-gpu.html Commented Oct 4, 2012 at 7:12
  • Matlab definitely multi-threads. I'm testing it on my machine right now and it's using 4 cores. Commented Oct 4, 2012 at 7:13
  • Matlab certainly does multi-thread, at least R2011b does with default settings and no interference from the o/s. Commented Oct 4, 2012 at 7:14

2 Answers 2

13

It's a combination of several things:

  • Matlab does indeed multi-thread.
  • The core is heavily optimized with vector instructions.

Here's the numbers on my machine: Core i7 920 @ 3.5 GHz (4 cores)

>> a = rand(10000); >> b = rand(10000); >> tic;a*b;toc Elapsed time is 52.624931 seconds. 

Task Manager shows 4 cores of CPU usage.

Now for some math:

Number of multiplies = 10000^3 = 1,000,000,000,000 = 10^12 Max multiplies in 53 secs = (3.5 GHz) * (4 cores) * (2 mul/cycle via SSE) * (52.6 secs) = 1.47 * 10^12 

So Matlab is achieving about 1 / 1.47 = 68% efficiency of the maximum possible CPU throughput.

I see nothing out of the ordinary.

Sign up to request clarification or add additional context in comments.

6 Comments

matrix matrix multiplication also performs adds, not only muls. I think that your performance estimates should include 4 FLOPs/cycle via SSE, but twice as many operations. Am I correct? And is your MATLAB not using AVX-enabled BLAS?
@angainor It's actually one add and one mul per cycle. Each one can be SSE. However, the add and muls are separate execution units, so you can't "double up" on one if you don't use the other.
Thats correct. Just for the sake of this analysis, adds should be included. The results are the same, just that you made a non-trivial shortcut. Might be hard to understand for someone who does not know what you wrote in the comment.
The OP was only counting multiplications. I didn't want to confuse him with additions as well (even though they come out to be exactly the same).
By the way, how is it with AVX? I just assumed its 8 FLOPs per cycle. Is it also split between adds and muls?
|
5

To check whether you do or not use multi-threading in MATLAB use this command

maxNumCompThreads(n) 

This sets the number of cores to use to n. Now I have a Core i7-2620M, which has a maximum frequency of 2.7GHz, but it also has a turbo mode with 3.4GHz. The CPU has two cores. Let's see:

A = rand(5000); B = rand(5000); maxNumCompThreads(1); tic; C=A*B; toc Elapsed time is 10.167093 seconds. maxNumCompThreads(2); tic; C=A*B; toc Elapsed time is 5.864663 seconds. 

So there is multi-threading.

Let's look at the single CPU results. A*B executes approximately 5000^3 multiplications and additions. So the performance of single-threaded code is

5000^3*2/10.8 = 23 GFLOP/s 

Now the CPU. 3.4 GHz, and Sandy Bridge can do maximum 8 FLOPs per cycle with AVX:

3.4 [Ginstructions/second] * 8 [FLOPs/instruction] = 27.2 GFLOP/s peak performance 

So single core performance is around 85% peak, which is to be expected for this problem.

You really need to look deeply into the capabilities of your CPU to get accurate performannce estimates.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.