Skip to main content
10 events
when toggle format what by license comment
Jul 26, 2022 at 4:10 history edited Omar and Lorraine CC BY-SA 4.0
deleted 243 characters in body
Jul 10, 2019 at 12:08 comment added Ruslan @ToddWilcox I wonder how much time it would run with a coprocessor.
Apr 7, 2018 at 13:30 comment added Todd Wilcox Adding the math coprocessor was not cheap back in the day. Having the option to not get one was nice for anyone buying on a budget who wasn't going to do a lot of calculations. On the other hand, sometimes my father would run a spreadsheet calculation that would take more than 24 hours without a math coprocessor.
Apr 7, 2018 at 1:05 comment added Peter Cordes Separate doesn't make sense today because the performance wouldn't be acceptable; lots of code these days depends on having fast FPUs. The FPUs need access to the L1d cache, which is built-in to each core. Also, modern CPUs have more transistors than they can power at any one time without melting (en.wikipedia.org/wiki/Dark_silicon), so having lots of dedicated hardware for stuff you might need (to run it fast when you do need it) but don't use all the time is less of a burden than in the past.
Apr 7, 2018 at 1:01 comment added Peter Cordes See agner.org/optimize for instruction throughput/latency / execution port numbers. The extra 512-bit-wide FMA hardware is so big that some SKL-X models only come with one working 512-bit FMA unit, but still always two per clock throughput for 256-bit FMAs. And some funky combining of the FMA units on port 0/1 and powering up the extra one on port 5 when 512-bit instructions are in flight. anandtech.com/show/11550/…
Apr 7, 2018 at 0:58 comment added Peter Cordes The FMA units in Haswell and Skylake(desktop) are probably the largest single chunk of logic (not cache) on each core, and the FP dividers are probably considerable too. Those chips have a throughput of 2 per clock for SIMD vector FMA, on 256-bit ymm registers. So that's 2x 8x 32-bit single-precision FMA, and 2x 4x 64-bit double-precision FMA. (Probably sharing some of the multiplier transistors between single / double precision mantissa multipliers). AVX512 makes Skylake-X even heavier, doubling the width of the FMA units. Skylake did drop the separate adder unit, running it on the FMAs.
S Apr 4, 2018 at 14:46 history suggested David Richerby CC BY-SA 3.0
Removed tautology from first sentence
Apr 4, 2018 at 14:45 review Suggested edits
S Apr 4, 2018 at 14:46
Apr 4, 2018 at 14:10 history edited Omar and Lorraine CC BY-SA 3.0
added 277 characters in body
Apr 4, 2018 at 8:31 history answered Omar and Lorraine CC BY-SA 3.0