Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

3
  • 1
    double temporaries is only free on x86 with the x87 FPU, not with SSE2. Auto-vectorizing a loop with double temporaries means unpacking float to double, which takes an extra instruction, and you process half as many elements per vector. Without auto-vectorization, the conversion can usually happen on the fly during a load or store, but it means extra instructions when you're mixing floats and doubles in expressions. Commented Apr 27, 2016 at 19:17
  • 1
    On modern x86 CPUs, div and sqrt are faster for float than double, but other things are the same speed (not counting the SIMD vector width issue, or memory bandwidth / cache footprint of course). Commented Apr 27, 2016 at 19:18
  • @PeterCordes thanks for expanding some points. I was not aware of the div and sqrt disparity Commented Apr 28, 2016 at 8:03