Architecture for Large-Scale Matrix Multiplication: Distributed Architecture or One Strong Server

Question

I've asked a question about Scaling Matrix Multiplication by CPU Cores on StackOverflow and it seems that merely adding more CPU cores to this problem won't reduce the time to do Matrix Multiplications dramatically.

Now I'm wondering if scalable architectures are the answer for large-scale matrix multiplications OR a strong server with lots of cores and memory?

The latency of scalable architectures like Hadoop is a negative aspect but I'm also wondering if throwing more powerful CPUs (like Intel Core i9 7980XE) at the problem would be able increase performance considerably.

What I'm aiming for is a High-Throughput and Low-Latency architecture and for the sake of argument, let's pretend Price is not a constraint (But please don't advice SuperComputer architectures! The Price of those things are actually a constraint!)

This is a little off-topic for software engineering SE. Problems here generally deal with software architecture problems. So they say, "Never trust a programmer with a screwdriver," and I'm inclined to agree. ;) — Neil
– Neil, Commented Jul 5, 2018 at 9:54
I thought of this too, but then where can I ask this kinds of questions? I'm thinking "StackExchange/Hardware Recommendations" but then they may not know enough about software implications of large-scale computations. — Cypher
– Cypher, Commented Jul 5, 2018 at 9:57
@Neil: SE.SE main topic is the Systems development life cycle, not just software, check the help center. This is a more holistic view than just software, so IMHO this question is not off-topic (maybe a little bit broad). — Doc Brown
– Doc Brown, Commented Jul 5, 2018 at 19:29
I adding more cores doesn't help why would either a 'strong server with lots of cores' or a cluster of machines be options? The algorithm can either be implemented in parallel or it can't. multi-core versus multi-machine would only be relevant if you can do this in parallel. — JimmyJames
– JimmyJames, Commented Aug 16, 2018 at 17:26

Walter Kuhn · Accepted Answer · 2018-08-16 15:23:44Z

From my (slightly older) experience, the answer is: take both together.

Matrix Multiplication is a classical HPCN problem solved in libaries such as BLAS (see http://www.netlib.org/blas/). Those libraries are optimized and can also be used on supercomputers or similar scalable systems. You could also benefit from other well suited algorithms that split and distribute your matrices (if they are really large) onto a network of nodes. See for example https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm

In the end, it is a question on the communication performance (of partial matrix blocks) versus computation performance, and the selected algorithm (with its algorithmic complexity).

The selection of your algorithm also depends on the structure of you matrices (ako the problem). Is it a dense matrix?

Thus: yes, it is a matter of software architecture to my point of view. And I highly advise you to inspect some of the works of Jack Dongarra. (it might not be the newest - however excellent starting points).

Stack Exchange Network

Architecture for Large-Scale Matrix Multiplication: Distributed Architecture or One Strong Server

1 Answer 1

Hot Network Questions

Architecture for Large-Scale Matrix Multiplication: Distributed Architecture or One Strong Server

1 Answer 1

Related

Hot Network Questions