Timeline for Why did some early CPUs use external math chips?

Current License: CC BY-SA 4.0

10 events

when toggle format	what		by	license	comment
Jul 26, 2022 at 4:10	history	edited	Omar and Lorraine	CC BY-SA 4.0	deleted 243 characters in body
Jul 10, 2019 at 12:08	comment	added	Ruslan		@ToddWilcox I wonder how much time it would run with a coprocessor.
Apr 7, 2018 at 13:30	comment	added	Todd Wilcox		Adding the math coprocessor was not cheap back in the day. Having the option to not get one was nice for anyone buying on a budget who wasn't going to do a lot of calculations. On the other hand, sometimes my father would run a spreadsheet calculation that would take more than 24 hours without a math coprocessor.
Apr 7, 2018 at 1:05	comment	added	Peter Cordes		Separate doesn't make sense today because the performance wouldn't be acceptable; lots of code these days depends on having fast FPUs. The FPUs need access to the L1d cache, which is built-in to each core. Also, modern CPUs have more transistors than they can power at any one time without melting (en.wikipedia.org/wiki/Dark_silicon), so having lots of dedicated hardware for stuff you might need (to run it fast when you do need it) but don't use all the time is less of a burden than in the past.
Apr 7, 2018 at 1:01	comment	added	Peter Cordes		See agner.org/optimize for instruction throughput/latency / execution port numbers. The extra 512-bit-wide FMA hardware is so big that some SKL-X models only come with one working 512-bit FMA unit, but still always two per clock throughput for 256-bit FMAs. And some funky combining of the FMA units on port 0/1 and powering up the extra one on port 5 when 512-bit instructions are in flight. anandtech.com/show/11550/…
Apr 7, 2018 at 0:58	comment	added	Peter Cordes		The FMA units in Haswell and Skylake(desktop) are probably the largest single chunk of logic (not cache) on each core, and the FP dividers are probably considerable too. Those chips have a throughput of 2 per clock for SIMD vector FMA, on 256-bit ymm registers. So that's 2x 8x 32-bit single-precision FMA, and 2x 4x 64-bit double-precision FMA. (Probably sharing some of the multiplier transistors between single / double precision mantissa multipliers). AVX512 makes Skylake-X even heavier, doubling the width of the FMA units. Skylake did drop the separate adder unit, running it on the FMAs.
S Apr 4, 2018 at 14:46	history	suggested	David Richerby	CC BY-SA 3.0	Removed tautology from first sentence
Apr 4, 2018 at 14:45	review	Suggested edits
S Apr 4, 2018 at 14:46
Apr 4, 2018 at 14:10	history	edited	Omar and Lorraine	CC BY-SA 3.0	added 277 characters in body
Apr 4, 2018 at 8:31	history	answered	Omar and Lorraine	CC BY-SA 3.0