1

I am trying to generate a hash code from two integer inputs. The approach outlined in

Combining Java hashcodes into a "master" hashcode

seems to work well for many input values. However, when one of the input integers is int.MinValue, the behavior seems less than ideal. Specifically I observe

int.MinValue * 1013 == int.MinValue int.MinValue * 1009 == int.MinValue 

but

int.MinValue * 2 == 0 int.MinValue * 20 == 0 

All of this is in an unchecked context.

I would naively (and wrongly) assume that int.MinValue * (something other than 1 or 0) would yield a new bit pattern different than int.MinValue or 0.

Questions

  1. Why does multiplying int.MinValue by these constants yield int.MinValue (2 cases) or 0 (2 cases)?
  2. Does the behavior of int.MinValue indicate a flaw in the hash algorithm?
2
  • It might indicate a flaw in that hash algorithm, but not all hash algorithms are implemented with only multiplication. In fact, take a look at how .NET usually does it, by adding in one prime number and multiplying with another. The addition might indicate a fix for this "flaw". Commented Jul 1, 2014 at 6:42
  • @LasseV.Karlsen: It seems this method of creating a hash is considered flawed relative to other alternatives. A Rotating Hash seems like a more appropriate choice eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx Commented Jul 1, 2014 at 19:43

1 Answer 1

2

Multiplication is more or lest shift of bits to left. Since int.MinValue is 0x80000000 (only one highest bit set) multiplication can produce only two int values - 0 (if multiplying by even number ) or value with highest bit still set (for odd numbers).

Sample for 4 bit numbers (x,y,z - any value for particular bit, 1000 is equivalent of int.MinValue )

1000 * xyz1 = (xyz0 * 1000) + 1000 * 1 = (xyz * 10000) + 1000 * 1 = (xyz * 0) + 1000 = 1000 1000 * xyz0 = (xyz * 10000) + 1000 * 0 = 0 
Sign up to request clarification or add additional context in comments.

3 Comments

So this means that for prime P if P * inputValue < int.MinValue (when looking at arbitrary precision results), the final result would be int.MinValue (when looking at Int32 results), since P must be odd for P > 2? This seems to indicate a large number of hash collisions for values near int.MinValue. Would it be better to perform the math in a long then cast back to int before returning a hash value?
I'm not good with probabilities nor hashing expert, so I can't say how serious the problem is. To me it feels not very common - you either hash once (and should get random distribution) or combine multiple hash values - getting all hashes to be close to int.MinValue don't feel very likely. I've added "hashcode" tag so there are more useful related questions - check out stackoverflow.com/questions/34595/… as it seem to have reasonable discussion/links on hashing in general.
I think for my purposes, the edge cases will not matter. For general purposes, I learned that this method of combining hashes is not very good and would use something more like a Rotating Hash eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.