2

There is some obvious stuff I feel I should understand here, but I don:t:

void main() { long first = 0xffffffc1; long second = 0x92009019; //correct __int64 correct = (((__int64)first << 32) | 0x00000000ffffffff) & (0xffffffff00000000 | second); //output is 0xffffffc192009019; //incorrect __int64 wrong = (double)(((__int64)first << 32) + second); //output is 0xffffffc092009019; } 

why does the add operation affect the upper 4 bytes, and how?

(compiler is VC++ 2003)

3
  • The cast to double shouldn't be there, right? Commented Apr 18, 2011 at 13:54
  • 1
    Note that main must never return void! Its return type is always int. Other compilers correctly flag this as an error. Commented Apr 18, 2011 at 14:06
  • the cast to double from __int64 wrong = (double)... can be removed, it does not affect the outcome Commented Apr 18, 2011 at 15:29

2 Answers 2

4

Probably because second is signed, which mean that 0x92009019 is negative.

EDIT: The quesiton actually contains two questions.

1) How do you join two 32 bit numbers to a 64 bit value?

Answer:

(((uint64_t)first) << 32) | (uint32_t)second 

2) Is it wise to do bit operations using the floating-point type double?

Answer: No, it's not. Please use the right tool for the job. If you want to do bit operations, use integers. If you want (almost) continuous values, use floating-point values.

Sign up to request clarification or add additional context in comments.

5 Comments

Actually, they both are negative, no?
Yes, they are both negative. However, when you cast first to the 64 bit type, and left-shift it you will get the same underlying bit pattern as though it would have been unsigned. Later, when you add the signed second it will affect the upper part if it is negative. If you would add it as an unsigned value, it would always fit nicely into the 32 lower bits that was cleared by the left shift of first. In fact, I would have written | instead of an + to indicate that I'm joining bit patterns rather than doing normal arithmetics.
(unsigned __int64)first << 32 | (unsigned __int64)second; gives me 0xffffffff92009019...uint64_t fails to compile on VC++ 2003
Try (unsigned __int32)second (not 64), which should correspond to what I wrote in my answer. I used uint32_t etc. in my answer as those corresponds to the standard type names (which, VC++ 2003, doesn't follow). By casting directly from first to a 64 bit type, you still have the sign extend earlier I explained caused your original problem.
works OK. I expected (uint64_t)0x92009019 to yield 0x0000000092009019, but it gives 0xffffffff92009019...
3

A long has 53 bits of precision. I'm quite surprised you got the last digits right. (The first wrong digit is explained by Lindydancer).

Edit: I'm no more surprised: as the result is negative you don't need only 38 bit of precision with your data. If you use

first = 0xffdfffc1; 

you are loosing the lsb with the double solution.

4 Comments

you're on 54-bit machine? ;-)
yeah that too, long is 32-bits
The significant (aka mantissa) of the most common double format has 53 bits of precision.
A double is a floating-point value (i.e. a number of digits plus an exponent). All other operations in the question are integers. The comments in the question suggested that you had problems with creating a 64 bit integer value even before the result was casted to the floating-point value. The best way to sort out what you really want to known is to split this question in two: 1) How do you join two 32 bit integers to a 64 bit integer, and 2) is it wise to do bit operations on a double value (short answer: no, it's not).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.