arithmetic with double vs bit operations

Question

There is some obvious stuff I feel I should understand here, but I don:t:

void main() { long first = 0xffffffc1; long second = 0x92009019; //correct __int64 correct = (((__int64)first << 32) | 0x00000000ffffffff) & (0xffffffff00000000 | second); //output is 0xffffffc192009019; //incorrect __int64 wrong = (double)(((__int64)first << 32) + second); //output is 0xffffffc092009019; }

why does the add operation affect the upper 4 bytes, and how?

(compiler is VC++ 2003)

Note that main must never return void! Its return type is always int. Other compilers correctly flag this as an error. — Konrad Rudolph
– Konrad Rudolph, Commented Apr 18, 2011 at 14:06
the cast to double from __int64 wrong = (double)... can be removed, it does not affect the outcome — user581243
– user581243, Commented Apr 18, 2011 at 15:29

Lindydancer · Accepted Answer · 2011-04-18 14:40:06Z

4

Probably because second is signed, which mean that 0x92009019 is negative.

EDIT: The quesiton actually contains two questions.

1) How do you join two 32 bit numbers to a 64 bit value?

Answer:

(((uint64_t)first) << 32) | (uint32_t)second

2) Is it wise to do bit operations using the floating-point type double?

Answer: No, it's not. Please use the right tool for the job. If you want to do bit operations, use integers. If you want (almost) continuous values, use floating-point values.

edited Apr 18, 2011 at 14:40

answered Apr 18, 2011 at 13:47

Lindydancer

26.3k4 gold badges54 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

T.E.D. Over a year ago

Actually, they both are negative, no?

Lindydancer Over a year ago

Yes, they are both negative. However, when you cast first to the 64 bit type, and left-shift it you will get the same underlying bit pattern as though it would have been unsigned. Later, when you add the signed second it will affect the upper part if it is negative. If you would add it as an unsigned value, it would always fit nicely into the 32 lower bits that was cleared by the left shift of first. In fact, I would have written | instead of an + to indicate that I'm joining bit patterns rather than doing normal arithmetics.

user581243 Over a year ago

(unsigned __int64)first << 32 | (unsigned __int64)second; gives me 0xffffffff92009019...uint64_t fails to compile on VC++ 2003

Lindydancer Over a year ago

Try (unsigned __int32)second (not 64), which should correspond to what I wrote in my answer. I used uint32_t etc. in my answer as those corresponds to the standard type names (which, VC++ 2003, doesn't follow). By casting directly from first to a 64 bit type, you still have the sign extend earlier I explained caused your original problem.

user581243 Over a year ago

works OK. I expected (uint64_t)0x92009019 to yield 0x0000000092009019, but it gives 0xffffffff92009019...

AProgrammer · Accepted Answer · 2011-04-18 14:02:50Z

3

A long has 53 bits of precision. I'm quite surprised you got the last digits right. (The first wrong digit is explained by Lindydancer).

Edit: I'm no more surprised: as the result is negative you don't need only 38 bit of precision with your data. If you use

first = 0xffdfffc1;

you are loosing the lsb with the double solution.

edited Apr 18, 2011 at 14:02

answered Apr 18, 2011 at 13:49

AProgrammer

52.6k8 gold badges96 silver badges149 bronze badges

4 Comments

vartec Over a year ago

you're on 54-bit machine? ;-)

user581243 Over a year ago

yeah that too, long is 32-bits

AProgrammer Over a year ago

The significant (aka mantissa) of the most common double format has 53 bits of precision.

Lindydancer Over a year ago

A double is a floating-point value (i.e. a number of digits plus an exponent). All other operations in the question are integers. The comments in the question suggested that you had problems with creating a 64 bit integer value even before the result was casted to the floating-point value. The best way to sort out what you really want to known is to split this question in two: 1) How do you join two 32 bit integers to a 64 bit integer, and 2) is it wise to do bit operations on a double value (short answer: no, it's not).

Collectives™ on Stack Overflow

arithmetic with double vs bit operations

2 Answers 2

5 Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

4 Comments

Related