Skip to content

Conversation

@Artoria2e5
Copy link
Contributor

@Artoria2e5 Artoria2e5 commented Jul 14, 2021

The SSE implementation of the function uses single-precision float, whereas this one goes for.... double all over the place.

Extremely unscientific comparisons on godbolt (https://gcc.godbolt.org/z/oa3hP5ffs) shows that both Clang and GCC do much better generating x64 code when float is used. Other SIMD systems should act similarly, but I can't remember the target names. (For more human-like code in clang, try -Ofast. I could add an attribute or some assumes there, but ehhhh... sounds unnecessary.)

PS: It might be a good idea to review other internal uses of double too. The two cases left appear to be generally sums and other statistics, which I guess is better with double, and gamma, which has an external double API.

The SSE implementation of the function uses single-precision float, whereas this one goes for.... double all over the place. Extremely unscientific comparisons on godbolt (https://gcc.godbolt.org/z/oa3hP5ffs) shows that both Clang and GCC do much better generating code when float is used.
@kornelski kornelski merged commit c2ce900 into ImageOptim:master Jul 14, 2021
@kornelski
Copy link
Member

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants