Skip to main content
2 of 2
clarify vector instructions
Bill Greene
  • 6.4k
  • 1
  • 18
  • 26

First off, I agree with Brian Borchers comments about profiling to make sure these element-wise multiplications are where your performance issue lies. However, since you are convinced that is your problem, here is another suggestion.

Before trying to exploit multiple CPU, I would make sure that you have an implementation that exploits vectorization. The SSE2 instruction set (available in most modern processors) has an operation to multiply vectors of double precision floating point numbers. Your code may or may not allow your compiler to exploit this instruction.

As far as I know, Armadillo does not have any direct support for SSE2. But since you indicated a willingness to switch libraries, the Eigen library (http://eigen.tuxfamily.org/index.php?title=Main_Page) definitely does generate code using the SSE2 instructions. This could give you a 4x improvement for single precision multiplies and 2x improvement for double precision. If you are fortunate enough to have a CPU that supports the AVX instruction set, the development version of Eigen supports this to provide additional speedup.

Bill Greene
  • 6.4k
  • 1
  • 18
  • 26