Timeline for Why did some early CPUs use external math chips?
Current License: CC BY-SA 4.0
10 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jul 26, 2022 at 4:10 | history | edited | Omar and Lorraine | CC BY-SA 4.0 | deleted 243 characters in body |
| Jul 10, 2019 at 12:08 | comment | added | Ruslan | @ToddWilcox I wonder how much time it would run with a coprocessor. | |
| Apr 7, 2018 at 13:30 | comment | added | Todd Wilcox | Adding the math coprocessor was not cheap back in the day. Having the option to not get one was nice for anyone buying on a budget who wasn't going to do a lot of calculations. On the other hand, sometimes my father would run a spreadsheet calculation that would take more than 24 hours without a math coprocessor. | |
| Apr 7, 2018 at 1:05 | comment | added | Peter Cordes | Separate doesn't make sense today because the performance wouldn't be acceptable; lots of code these days depends on having fast FPUs. The FPUs need access to the L1d cache, which is built-in to each core. Also, modern CPUs have more transistors than they can power at any one time without melting (en.wikipedia.org/wiki/Dark_silicon), so having lots of dedicated hardware for stuff you might need (to run it fast when you do need it) but don't use all the time is less of a burden than in the past. | |
| Apr 7, 2018 at 1:01 | comment | added | Peter Cordes | See agner.org/optimize for instruction throughput/latency / execution port numbers. The extra 512-bit-wide FMA hardware is so big that some SKL-X models only come with one working 512-bit FMA unit, but still always two per clock throughput for 256-bit FMAs. And some funky combining of the FMA units on port 0/1 and powering up the extra one on port 5 when 512-bit instructions are in flight. anandtech.com/show/11550/… | |
| Apr 7, 2018 at 0:58 | comment | added | Peter Cordes | The FMA units in Haswell and Skylake(desktop) are probably the largest single chunk of logic (not cache) on each core, and the FP dividers are probably considerable too. Those chips have a throughput of 2 per clock for SIMD vector FMA, on 256-bit ymm registers. So that's 2x 8x 32-bit single-precision FMA, and 2x 4x 64-bit double-precision FMA. (Probably sharing some of the multiplier transistors between single / double precision mantissa multipliers). AVX512 makes Skylake-X even heavier, doubling the width of the FMA units. Skylake did drop the separate adder unit, running it on the FMAs. | |
| S Apr 4, 2018 at 14:46 | history | suggested | David Richerby | CC BY-SA 3.0 | Removed tautology from first sentence |
| Apr 4, 2018 at 14:45 | review | Suggested edits | |||
| S Apr 4, 2018 at 14:46 | |||||
| Apr 4, 2018 at 14:10 | history | edited | Omar and Lorraine | CC BY-SA 3.0 | added 277 characters in body |
| Apr 4, 2018 at 8:31 | history | answered | Omar and Lorraine | CC BY-SA 3.0 |