0

Both User.ID and Group.ID are Int16 and immutable and I want to generate an optimal HashCode.

This is the equality

 public override bool Equals(Object obj) { //Check for null and compare run-time types. if (obj == null || GetType() != obj.GetType()) return false; UserGroup item = (UserGroup)obj; return (User.ID == item.User.ID && Group.ID == item.Group.ID); } 

What would be an optimal GetHashCode. Right now I am using the following but only because I saw it as an example. The primary use of the Object is in a HashSet and that HashSet gets a lot of .Select(x => x.User.ID = y) or .Select(x => x.Group.ID = y).

 public override int GetHashCode() { return (int)User.ID ^ (int)Group.ID; } 
1
  • 1
    Instead of GetType() != obj.GetType(), please write !(obj is UserGroup). Commented Apr 1, 2012 at 16:25

2 Answers 2

1

Never skip the opportunity to generate a perfect hash. Two 16-bit shorts fit in a 32-bit int:

 public override int GetHashCode() { return (User.ID & 0xffff) + (Group.ID << 16); } 

The first part of the expression isolates the lower 16 bits, the & operator is necessary to make it work properly for negative values so only bits 0 through 15 are used. The second part moves the 16 bits of Group.ID to bit positions 16 through 31. Adding them combines the two sets of 16 bits to make a 32-bit number.

Sign up to request clarification or add additional context in comments.

2 Comments

Don't fully understand this answer either but in testing with several values it did yield unique results. Those same values using XOR did not yield unique results so I is clearly a better hash. I posted another question an packing two 32 into a 64 and then extracting if you want to pick up a quick answer.
It's not just a better hash, it is a perfect hash :)
0

Well, actually neither is preferred if you want good hash distribution (just one illustration: with your implementation, User.ID = 10, Group.ID = 5 will hash to the same value as User.ID = 5, Group.ID = 10). It's usable, of course, but take a look at what the master had to say about this.

2 Comments

I saw that and trust Keets as the right answer but to be honest I did not understand it.
Well, the (vastly simplified) point is that you want to avoid different inputs to generate the same hash code (a "hash collision"), because then the performance of any hash-based algorithm or class will begin to degenerate from O(1) towards O(n). Hans quite nailed it with his answer, because in your use case it is possible to generate a hash code that is guaranteed to be unique for each unique input combination (it's a "perfect hash"): you have two 16-bit integers and the hash code is 32 bits, so it can accommodate all possible combinations of your inputs.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.