I am using the softfloat library (http://www.jhauser.us/arithmetic/SoftFloat.html) to implement a single precision division algorithm. I am trying to understand the reciprocal approximation function implemented as part of the softfloat library. Please see below the code. Could anyone explain how they came up with the LUT? It looks like a combination of LUT and NR approximations, but a detailed explanation would definitely help.
/* Returns an approximation to the reciprocal of the number represented by `a', where `a' is interpreted as an unsigned fixed-point number with one integer bit and 31 fraction bits. The `a' input must be "normalized", meaning that its most-significant bit (bit 31) must be 1. Thus, if A is the value of the fixed-point interpretation of `a', then 1 <= A < 2. The returned value is interpreted as a pure unsigned fraction, having no integer bits and 32 fraction bits. The approximation returned is never greater than the true reciprocal 1/A, and it differs from the true reciprocal by at most 2.006 ulp (units in the last place). */ uint32_t softfloat_approxRecip32_1( uint32_t a ) { int index; uint16_t eps; static const uint16_t k0s[] = { 0xFFC4, 0xF0BE, 0xE363, 0xD76F, 0xCCAD, 0xC2F0, 0xBA16, 0xB201, 0xAA97, 0xA3C6, 0x9D7A, 0x97A6, 0x923C, 0x8D32, 0x887E, 0x8417 }; static const uint16_t k1s[] = { 0xF0F1, 0xD62C, 0xBFA1, 0xAC77, 0x9C0A, 0x8DDB, 0x8185, 0x76BA, 0x6D3B, 0x64D4, 0x5D5C, 0x56B1, 0x50B6, 0x4B55, 0x4679, 0x4211 }; uint16_t r0; uint32_t delta0; uint_fast32_t r; uint32_t sqrDelta0; index = a>>27 & 0xF; eps = (uint16_t) (a>>11); r0 = k0s[index] - ((k1s[index] * (uint_fast32_t) eps)>>20); delta0 = ~(uint_fast32_t) ((r0 * (uint_fast64_t) a)>>7); r = ((uint_fast32_t) r0<<16) + ((r0 * (uint_fast64_t) delta0)>>24); sqrDelta0 = ((uint_fast64_t) delta0 * delta0)>>32; r += ((uint32_t) r * (uint_fast64_t) sqrDelta0)>>48; return r; }
softfloat_approxRecip32_1()? For an input of0x40000000I get an output of0xab0f9c81which doesn't make much sense to me in the context of a reciprocal computation (assuming either fixed-point or floating-point computation). That said, the usage of the two tables seems to suggest that piecewise linear approximations are used.0xc0000000(= 3/2) leads to output of0xaaaaaaaa(= 2/3) My limited experiments suggest that the functions provides an underestimate of the true mathematical result, in that all results appear to be less than, or equal to, the mathematical result.k0sandk1sare slope and intersect of linear approximations, each stored in 0.16 fixed-point format, that is, purely fractional values. This givesr0. Fromr0it computesrwith one NR iterationr0 + r0 * (1 - r0 * a), withdelta0 = (1 - r0 * a). Not sure where thesqrDelta0comes in, it could be for the next term in a series expansion?