4

After some latency measurement tests, I figured out I need to optimize an pythagoras triangle calculation done on an embedded CPU with a pretty slow FPU.

The problem is if these calculations occur, they come in numbers and this messes up the timing. I cannot reduce the absolute number of calculations. But somehow they need to get faster ... by at least factor 5. :-/

I'm currently thinking of pre-processing these calculations since the input range of distinct values is somehow limited to about 300-500 permutations and interpolation between two table entry should suffice. But I was also wondering if using some conditions to the problem it might be possible to also speed up this code:

float h = 0.f, v=0.f; /// ... float const d = std::sqrt( (h*h) + (v*v) ); 

This I haven't used yet:

  1. The accuracy of result d is very limited to not more as 3 fractional digits are required
  2. The legs of the triangle (h,v) are always at aspect ratio of 4:3 or 16:9

I don't know if some integer fixed point calculation are available fort square root or if the function can be substituted to a one with less accuracy or somehow using the aspect ratio.

Any ideas?

Thank you!

3
  • 1
    Could you perhaps use that at 4:3, the diagonal is long side*1.25 and at 16:9 it's about long side*1.147? Commented Aug 9, 2013 at 13:33
  • Asking about a micro-optimization, it helps to mention what the CPU is. -1 Commented Aug 9, 2013 at 13:35
  • In some situations you may be able to avoid taking the root at all. E.g. when interested in comparisons only. Finding the minimum length can be replaced with finding the minimum squared length. Possibly not useful in this case however. Commented Aug 9, 2013 at 13:50

2 Answers 2

6

If you know the ratio is 16:9, you can do a little algebra:

h = 16*x v = 9*x x = h/16 sqrt((h*h) + (v*v)) = sqrt((16*16*x*x) + (9*9*x*x)) = sqrt((16*16+9*9)*x*x) = sqrt(16*16+9*9)*x = sqrt(16*16+9*9)*h/16 = sqrt(16*16+9*9)/16 * h 

pre-compute sqrt(16*16+9*9)/16:

static float const multiplier = std::sqrt(16*16+9*9)/16.0; 

and use

float const d = multiplier*h; 
Sign up to request clarification or add additional context in comments.

4 Comments

Yap, this seems to be the solution.
So if h goes up, d goes down?
@JoachimIsaksson: oops -- fixed
Wow. This looks very good... I thought the aspect ratio might be usable, interesting. I will test this ASAP!
1

I don't know how much RAM/ROM you have in this embedded system, but if you have some, pre-computing squares and square roots is probably the way to go. This assumes that your embedded CPU has the ability to cache and/or RAM/ROM access is faster than the floating point computations.

That said, there are numerical methods to calculate square root, but I am not sure that they would in practice be faster than the sqrt call, no matter how slow.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.