17

I was recently watching a great Computerphile video on passwords in which Mike Pound brags of his company's supercomputer having 4 graphics cards (Titan X's, to be exact).

As a numerical simulation enthusiast, I dream of building a desktop solely for simulation work. Why does Mike Pound measure his computer's computational ability by its graphics cards and not its processors? If I were building a computer, which item should I care about more?

11
  • 11
    I don't think this is necessarily a Gorilla vs. Shark question... There's a simple question: "Why does Mike Pound measure his computer's computational ability by its graphics cards, and not its processors?" which can be answered and its answer has constructive value for future readers. Commented Oct 5, 2017 at 4:50
  • 6
    @gnat: not even close. Of course, the question, in its current form, is not really about software engineering. But I guess it could be interpreted as a question about system's engineering, where system = "combination of hardware + software". Commented Oct 5, 2017 at 6:27
  • 10
    A computer with 4 graphics cards does not amount to a supercomputer (and neither does a cluster of 10 Raspberry Pis for that matter). Commented Oct 5, 2017 at 8:59
  • 10
    That's just a very expensive PC setup, not a supercomputer... Commented Oct 5, 2017 at 10:14
  • 3
    Isn't the simple answer to "Why does Mike Pound measure his computer's computational ability by its graphics cards" because the context is password cracking? If you problem space is something else, what you need to care about might be something else entirely. Commented Oct 5, 2017 at 16:46

3 Answers 3

32

Mike Pound obviously values the computational ability of the graphics cards higher than the computational ability of the CPUs.

Why? A graphics card is basically made up of MANY simplified processors which all run in parallel. For some simulation work, alot of the computation can be easily parallelised and processed in parallel on the thousands of cores available in the graphics cards, reducing the total processing time.

which item should I care about more? It really depends on the workload you care about, and how that workload can/is parallelised for use on a graphics card. If your workload is an embarrassingly parallel set of simple computations, and the software is written to take advantage of available graphics cards, then more graphics cards will have a far greater performance impact than more CPUs (dollar for dollar).

7
  • 5
    Adding some numbers. Let's say your main computer would be an AMD Epyc Server, 64 cores, 128 with Hyperthreading. Let's also say that a graphics card "core" is only 10% as fast. ONE TitanX still has 3072 cuda cores, roughly 12000 for the setup. Get the idea? IF you can run the problem on the graphics card, it is not "faster" - it is like comparing the speed of a horse carriage to a formula 1 car. Commented Oct 5, 2017 at 11:00
  • 3
    +1 for 'embarrassingly parallel set of simple computations', Very well written. Short and to the point. Commented Oct 5, 2017 at 11:24
  • 11
    @TomTom: Actually my preferred comparison is comparing a formula 1 car (your CPU) with a bullet train. Sure, the train and the car is approximately the same speed. But the train can move 1000 people from A to B faster than the formula 1 car. Commented Oct 5, 2017 at 12:18
  • 2
    @slebetman the point is the CPU is typically much faster in single-core performance (not approximately the same speed). Maybe we can compromise, and compare a supersonic jet airplane with a steam locomotive. Commented Oct 5, 2017 at 13:19
  • 2
    If I have to choose an analogy based on vehicle, I'd say the CPU is like a fighter jet (it's much faster for point-to-point transport and have many tricks up its sleeve that other vehicles can't, but can only carry very small load) while the GPU is like a cargo ship (it can carry significantly more load in parallel, but have much slower turnaround). Commented Oct 5, 2017 at 18:19
5

Check out https://developer.nvidia.com/cuda-zone (and google cuda nvidia for lots more info). The cuda architecture and high-end graphics cards are pretty widely used for desktop supercomputers. You can typically put together a several-Tflop box for under $10K(usd) using off-the-shelf whitebox components.

So...

As a numerical simulation enthusiast, I dream of building a desktop solely for simulation work

...cuda's pretty much far-and-away the best game in town for you. Maybe try asking again in https://scicomp.stackexchange.com/ or another stackexchange website, more directly involved with this kind of thing.

(By the way, I assume you're comfortable with the idea that we're talking about massively parallel programming here, so you may need to get familiar with that paradigm for algorithm design.)

5
  • And we are back to Ordos as usual. Commented Oct 5, 2017 at 11:48
  • 2
    @MichaelViktorStarberg Am I the only one not understanding the Ordos reference? Commented Oct 5, 2017 at 15:45
  • I'm afraid you are... :/ Commented Oct 5, 2017 at 16:06
  • 4
    @MarnixKlooster: I had to Google "Ordos." Not sure what a "ghost city" in China has to do with supercomputers or teraflops. Commented Oct 5, 2017 at 16:27
  • @MarnixKlooster You indeed are not. Commented Oct 5, 2017 at 18:38
2

If I was building a computer, which item should I care about more?

From a practical standpoint you should probably pay quite a bit of attention to the motherboard and CPU given the relative difficulty of upgrading compared to the GPU. After purchase is an awful time to discover you don't have space for four GPUs or a fast enough processor to keep them all busy.

You should also be aware that GPU performance is most often reported in single-precision FLOPs, and drops quite a bit for double precision. If you need the extra precision in your simulations you'll end up well below the advertised speed.

Off to the software engineering races

There are really two primary concerns from a software standpoint, the Von Neumann bottleneck and programming model. The CPU has fairly good access to main memory, the GPU has a large amount of faster memory onboard. It's not unknown that the time moving data in and out of the GPU completely negates any speed win. In general the CPU is a winner for moderate computation on large amounts of data while the GPU excels at heavy computation on smaller amounts. All of which brings us to the programming model.

At a high level the problem is the ancient and honored MIMD/SIMD debate. Multiple-Instruction/Multiple-Data systems have been the big winners in general and commercial computing. In this model, which includes the SMP, there multiple processors each executing their own individual instruction stream. It's the computer equivalent of a French kitchen, where you direct a small number of skilled cooks to complete relatively complicated tasks.

Single-Instruction/Multiple-Data systems, on the other hand, more closely resemble a huge room full of clerks chained to their desks following instructions from a master controller. "Everybody ADD lines 3 and 5!" It was used in its pure form in the ILLIAC and some "mini-super" systems but lost out in the marketplace. Current GPUs are a close cousin, they're more flexible but share the same general philosophy.

To sum up briefly:

  • For any given operation the CPU will be faster, while the GPU can perform many simultaneously. The difference is most apparent with 64-bit floats.
  • CPU cores can operate on any memory address, data for the GPU must be packaged into a smaller area. You only win if you're doing enough computations to offset the transfer time.
  • Code heavy in conditionals will typically be happier on the CPU.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.