I am writing a program for an embedded hardware that only supports 32-bit single-precision floating-point arithmetic. The algorithm I am implementing, however, requires a 64-bit double-precision addition and comparison. I am trying to emulate double datatype using a tuple of two floats. So a double d will be emulated as a struct containing the tuple: (float d.hi, float d.low).
The comparison should be straightforward using a lexicographic ordering. The addition however is a bit tricky because I am not sure which base should I use. Should it be FLT_MAX? And how can I detect a carry?
How can this be done?
Edit (Clarity): I need the extra significant digits rather than the extra range.
double, or just the extra significant digits?doubleprecision. Specifically,1.0E+20and1.0E-03differ by more than epsilon (fordoublethis is typically1.0E-16or so) so I'd expect that operations like1.0E+20 + 1.0E-03would equate to1.0E+20, even when usingdouble. Will that be an issue??