How are arcsin and arccos typically implemented? [duplicate]

Question

I was reading through Agner's list of assembly codes for x86 and x87 and noticed that there is no op-code for arcsin or arccos, but only arctan. So I've googled it and all the results implemented it by using atan and sqrt, which would mean that acos and asin should be significantly slower than atan because you need an additional sqrt, but I wrote a simple test program in C++ and acos and asin are both faster than atan:

#include <chrono> #include <cmath> #include <iostream> class timer { private: decltype(std::chrono::high_resolution_clock::now()) begin, end; public: void start() { begin = std::chrono::high_resolution_clock::now(); } void stop() { end = std::chrono::high_resolution_clock::now(); } template<typename T> auto duration() const { return std::chrono::duration_cast<T>(end - begin).count(); } auto nanoseconds() const { return duration<std::chrono::nanoseconds>(); } void printNS(char const* str) const { std::cout << str << ": " << nanoseconds() << std::endl; } }; int main(int argc, char**) { timer timer; double p1 = 0 + 0.000000001; double acc1{1}; timer.start(); //less than 8 seconds for(int i{0}; 200000000 > i; ++i) { acc1 += std::acos(i * p1); } timer.stop(); timer.printNS("acos"); timer.start(); //less than 8 seconds for(int i{0}; 200000000 > i; ++i) { acc1 += std::asin(i * p1); } timer.stop(); timer.printNS("asin"); timer.start(); //more than 12 seconds for(int i{0}; 200000000 > i; ++i) { acc1 += std::atan(i * p1); } timer.stop(); timer.printNS("atan"); timer.start(); //almost 20 seconds for(int i{0}; 200000000 > i; ++i) { acc1 += std::atan2(i * p1, i * p1); } timer.stop(); timer.printNS("atan"); std::cout << acc1 << '\n'; }

I've tried seeing the assembly on godbolt, but it doesn't inline acos or asin.

So how is it implemented or if it actually just uses atan, how can it be faster?

Modern math libraries do not use the x87 instructions as they are slower than direct implementations. That said, an fsqrt is quite fast with only 4 cycles on modern processors, so it doesn't matter too much. — fuz
– fuz, Commented Aug 6, 2019 at 1:02
@fuz: fsqrt is not a "complex" microcoded x87 instruction. It's one of the "basic" operations along with div/mul/add/sub that even SSE/AVX implement (sqrtsd), and that are required to have <= 0.5ulp error (i.e. correctly rounded) unlike trig / exp / log. It's also a single uop, unlike ~100 uops for x87 instructions like fsin. agner.org/optimize. — Peter Cordes
– Peter Cordes, Commented Aug 6, 2019 at 1:08
maybe a duplicate of How does C compute sin() and other math functions?. If you want to see the asm you actually microbenched, single-step into one of those function calls with a debugger, obviously. You haven't told use what OS, compiler, or CPU microarchitecture you're using. Different OSes have different math-library implementations of complex functions. — Peter Cordes
– Peter Cordes, Commented Aug 6, 2019 at 1:16
I'm voting to close this question as off-topic because it really needs OS, exact CPU, compiler, libraries used, etc. — Joshua
– Joshua, Commented Aug 6, 2019 at 1:19
This question might be a duplicate, but it's certainly not a duplicate of any of the three questions linked above, none of which deal with inverse trig functions. Also nearly all of the answers to the linked questions are incorrect -- no actual production c++ implementation does sine and cosine with raw Taylor series, for instance, despite what most answers to the second question claim. — Daniel McLaury
– Daniel McLaury, Commented Aug 6, 2019 at 1:54

Collectives™ on Stack Overflow

How are arcsin and arccos typically implemented? [duplicate]

0

Linked

Hot Network Questions