Skip to main content
deleted 152 characters in body
Source Link
Henrik Schumacher
  • 112.9k
  • 7
  • 197
  • 339

Not an answer, but some extended comments.

Sparse matrix-sparse matrix multiplication

MathematicaLover made the very good observation that--as written by OP--Mathematica performs sparse matrix-sparse matrix multiplies. In fact, Armadillo does it, too, it seems to be about twice as fast in that. I'd like to explain why I don't think thatwas quite surprised by the great margin by which Armadillo is better in this respectwas so much faster. Now I thinkbelieve that theI have an explanation for that: The compiler is probably clever enough to optimize in the following spirit:

First@AbsoluteTiming[ H10slow = H.H.H.H.H.H.H.H.H.H; Total[H10slow.f1]; ] First@AbsoluteTiming[ H2 = H.H; H4 = H2.H2; H10 = H2.H4.H4; Total[H10.f1]; ] H10 == H10slow 

1.27093

0.440999

True

Compared to this, the Armadillo function takes 0.631295 on my machine (  which is an Intel Haswell, so an architecture for which the MKL is highly optimized).

SoHence Armadillo does not seem to be really better than Mathematica+MKL in this benchmark; it is the C++ compiler that made the difference (because it is able to do what I did by hand).

Sparse matrix-vector multiplication

So, let's also compare Armadillo's sparse matrix-vector multiplication to Mathematica's (or rather that of MKL). To that end I compiled the function ArmilloDot with

F3 = H*H*H*H*H*H*H*H*H*H*f2; 

replaced by

F3 = H*(H*(H*(H*(H*(H*(H*(H*(H*(H*f2))))))))); 

Then I ran

RepeatedTiming[Total[arma@"ArmadilloDot"[H, f1]]] RepeatedTiming[Total[Nest[H.# &, f1, 10]]] RepeatedTiming[Total[MatrixPower[H, 10, f1]]] 

and got these as results:

{0.0148027, -9.3592*10^6}

{0.00386909, -9.3592*10^6}

{0.00334479, -9.3592*10^6}

So the Armadillo variant is actually awfully slow. But to be fair, the difference probably comes from CSR to CSC conversion/transposition of the sparse matrix. Transposing a sparse matrix is actually less quick than one would expect it to be.!

Disclaimer

Once again, all timing results are strongly hardware and software dependent. I only tried to point out that performance tests have to be made thoroughly. And that MKL is a really good math library --- if you use an Intel CPU. It is however notorious for nerfing AMD CPUs.

Not an answer, but some extended comments.

Sparse matrix-sparse matrix multiplication

MathematicaLover made the very good observation that--as written by OP--Mathematica performs sparse matrix-sparse matrix multiplies. In fact, Armadillo does it, too, it seems to be about twice as fast in that. I'd like to explain why I don't think that Armadillo is better in this respect. I think that the compiler is probably clever enough to optimize in the following spirit:

First@AbsoluteTiming[ H10slow = H.H.H.H.H.H.H.H.H.H; Total[H10slow.f1]; ] First@AbsoluteTiming[ H2 = H.H; H4 = H2.H2; H10 = H2.H4.H4; Total[H10.f1]; ] H10 == H10slow 

1.27093

0.440999

True

Compared to this, the Armadillo function takes 0.631295 on my machine (  which is an Intel Haswell, so an architecture for which the MKL is highly optimized).

So Armadillo does not seem to be really better than Mathematica+MKL in this benchmark; it is the C++ compiler that made the difference (because it is able to do what I did by hand).

Sparse matrix-vector multiplication

So, let's also compare Armadillo's sparse matrix-vector multiplication to Mathematica's (or rather that of MKL). To that end I compiled the function ArmilloDot with

F3 = H*H*H*H*H*H*H*H*H*H*f2; 

replaced by

F3 = H*(H*(H*(H*(H*(H*(H*(H*(H*(H*f2))))))))); 

Then I ran

RepeatedTiming[Total[arma@"ArmadilloDot"[H, f1]]] RepeatedTiming[Total[Nest[H.# &, f1, 10]]] RepeatedTiming[Total[MatrixPower[H, 10, f1]]] 

and got these as results:

{0.0148027, -9.3592*10^6}

{0.00386909, -9.3592*10^6}

{0.00334479, -9.3592*10^6}

So the Armadillo variant is actually awfully slow. But to be fair, the difference probably comes from CSR to CSC conversion/transposition of the sparse matrix. Transposing a sparse matrix is actually less quick than one would expect it to be.

Disclaimer

Once again, all timing results are strongly hardware and software dependent. I only tried to point out that performance tests have to be made thoroughly. And that MKL is a really good math library --- if you use an Intel CPU. It is however notorious for nerfing AMD CPUs.

Not an answer, but some extended comments.

Sparse matrix-sparse matrix multiplication

MathematicaLover made the very good observation that--as written by OP--Mathematica performs sparse matrix-sparse matrix multiplies. In fact, Armadillo does it, too, it seems to be about twice as fast in that. I was quite surprised by the great margin by which Armadillo was so much faster. Now I believe that I have an explanation for that: The compiler is probably clever enough to optimize in the following spirit:

First@AbsoluteTiming[ H10slow = H.H.H.H.H.H.H.H.H.H; Total[H10slow.f1]; ] First@AbsoluteTiming[ H2 = H.H; H4 = H2.H2; H10 = H2.H4.H4; Total[H10.f1]; ] H10 == H10slow 

1.27093

0.440999

True

Compared to this, the Armadillo function takes 0.631295 on my machine (which is an Intel Haswell, so an architecture for which the MKL is highly optimized).

Hence Armadillo does not seem to be really better than Mathematica+MKL in this benchmark; it is the C++ compiler that made the difference (because it is able to do what I did by hand).

Sparse matrix-vector multiplication

So, let's also compare Armadillo's sparse matrix-vector multiplication to Mathematica's (or rather that of MKL). To that end I compiled the function ArmilloDot with

F3 = H*H*H*H*H*H*H*H*H*H*f2; 

replaced by

F3 = H*(H*(H*(H*(H*(H*(H*(H*(H*(H*f2))))))))); 

Then I ran

RepeatedTiming[Total[arma@"ArmadilloDot"[H, f1]]] RepeatedTiming[Total[Nest[H.# &, f1, 10]]] RepeatedTiming[Total[MatrixPower[H, 10, f1]]] 

and got these as results:

{0.0148027, -9.3592*10^6}

{0.00386909, -9.3592*10^6}

{0.00334479, -9.3592*10^6}

So the Armadillo variant is actually awfully slow!

Disclaimer

Once again, all timing results are strongly hardware and software dependent. I only tried to point out that performance tests have to be made thoroughly. And that MKL is a really good math library --- if you use an Intel CPU. It is however notorious for nerfing AMD CPUs.

Source Link
Henrik Schumacher
  • 112.9k
  • 7
  • 197
  • 339

Not an answer, but some extended comments.

Sparse matrix-sparse matrix multiplication

MathematicaLover made the very good observation that--as written by OP--Mathematica performs sparse matrix-sparse matrix multiplies. In fact, Armadillo does it, too, it seems to be about twice as fast in that. I'd like to explain why I don't think that Armadillo is better in this respect. I think that the compiler is probably clever enough to optimize in the following spirit:

First@AbsoluteTiming[ H10slow = H.H.H.H.H.H.H.H.H.H; Total[H10slow.f1]; ] First@AbsoluteTiming[ H2 = H.H; H4 = H2.H2; H10 = H2.H4.H4; Total[H10.f1]; ] H10 == H10slow 

1.27093

0.440999

True

Compared to this, the Armadillo function takes 0.631295 on my machine ( which is an Intel Haswell, so an architecture for which the MKL is highly optimized).

So Armadillo does not seem to be really better than Mathematica+MKL in this benchmark; it is the C++ compiler that made the difference (because it is able to do what I did by hand).

Sparse matrix-vector multiplication

So, let's also compare Armadillo's sparse matrix-vector multiplication to Mathematica's (or rather that of MKL). To that end I compiled the function ArmilloDot with

F3 = H*H*H*H*H*H*H*H*H*H*f2; 

replaced by

F3 = H*(H*(H*(H*(H*(H*(H*(H*(H*(H*f2))))))))); 

Then I ran

RepeatedTiming[Total[arma@"ArmadilloDot"[H, f1]]] RepeatedTiming[Total[Nest[H.# &, f1, 10]]] RepeatedTiming[Total[MatrixPower[H, 10, f1]]] 

and got these as results:

{0.0148027, -9.3592*10^6}

{0.00386909, -9.3592*10^6}

{0.00334479, -9.3592*10^6}

So the Armadillo variant is actually awfully slow. But to be fair, the difference probably comes from CSR to CSC conversion/transposition of the sparse matrix. Transposing a sparse matrix is actually less quick than one would expect it to be.

Disclaimer

Once again, all timing results are strongly hardware and software dependent. I only tried to point out that performance tests have to be made thoroughly. And that MKL is a really good math library --- if you use an Intel CPU. It is however notorious for nerfing AMD CPUs.