Timeline for Parallelization of element-wise matrix multiplication

7 events

when toggle format	what		by	license	comment
Aug 4, 2015 at 16:10	comment	added	mtall		Another observation: Armadillo uses column-major layout, so to get best performance you need to work on columns instead of rows.
Aug 4, 2015 at 16:07	comment	added	mtall		To speed up your first code, use the `-O3` optimization switch in GCC or clang (or the equivalent in MSVC) to enable auto-vectorization. This will make Armadillo use SSE2 instructions. For even more speed, use `-O3 -march=native`, which will enable AVX instructions. More information is on the Armadillo FAQ page.
May 10, 2015 at 1:03	answer	added	Bill Greene		timeline score: 2
May 9, 2015 at 23:43	history	edited	The Quantum Physicist	CC BY-SA 3.0	added 134 characters in body
May 9, 2015 at 23:07	answer	added	Brian Borchers		timeline score: 3
May 9, 2015 at 20:16	history	edited	The Quantum Physicist	CC BY-SA 3.0	deleted 7 characters in body
May 9, 2015 at 20:08	history	asked	The Quantum Physicist	CC BY-SA 3.0