26

The MSDN documentation on Object.GetHashCode() describes 3 contradicting rules for how the method should work.

  1. If two objects of the same type represent the same value, the hash function must return the same constant value for either object.
  2. For the best performance, a hash function must generate a random distribution for all input.
  3. The hash function must return exactly the same value regardless of any changes that are made to the object.

Rules 1 & 3 are contradictory to me.

Does Object.GetHashCode() return a unique number based on the value of an object, or the reference to the object. If I override the method I can choose what to use, but I'd like to know what is used internally if anyone knows.

6 Answers 6

30

Rules 1 & 3 are contradictory to me.

To a certain extent, they are. The reason is: if an object is stored in a hash table and, by changing its value, you change its hash then the hash table has lost the value and you can't find it again by querying the hash table. — It is therefore important that while objects are stored in a hash table, they retain their hash value.

To ensure this it is often simplest to make hashable objects immutable, thus side-stepping the problem wholly. But it is actually sufficient to make only those fields immutable that determine the hash value.

Consider the following example:

struct Person { public readonly string FirstName; public readonly string Name; public readonly DateTime Birthday; public int ShoeSize; } 

People rarely change their name, and even less frequently their birthday. However, their shoe size may grow arbitrarily, or even shrink. It is therefore reasonable to identify people using their birthday and name but not their shoe size. The hash value should reflect this:

public override int GetHashCode() { return HashCode.Combine(FirstName, Name, Birthday); } 

And remember to override Equals whenever you override GetHashCode (and vice-versa)!

Sign up to request clarification or add additional context in comments.

6 Comments

Since all objects are hashable in C# (GetHashCode() is part of the very basic Object type), you would suggest to make all objects immutable - not very practical, isn't it?
@thewhiteambit No. I’m suggesting that not all objects are good candidates for hash table keys. Just because they can be hashed doesn’t mean that they should be. And the fact that GetHashCode is a method of the Object base class is simply a bad design decision in the C# language. Furthermore, my answer isn’t saying that you must make every hash table key type immutable — merely that doing so helps tremendously.
You are right, I was just pointing to the sentence "To realize this it is often simplest to make hashable objects immutable" - and since all objects are hashable (by bad design choice) this very attempt would make all hashable objects (equals all objects) immutable. But I guess you did not mean it that way. You probably meant just to make all objects one wants to store in Hash-Collections immutable.
@thewhiteambit Ah, yes. As in: if you have control over the object type you’re going to hash, (strongly) consider making it immutable.
@FarhadZamani Xor is a simple method of mixing hashes. It’s not very good — ideally you’d something else, e.g. HashCode.Combine but (IIRC) that wasn’t available yet when I wrote this answer.
|
5

Not sure what MSDN documentation you are referring to. Looking at the current documentation on Object.GetHashCode (http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx) provides the following "rules":

  • If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

  • The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

  • For the best performance, a hash function must generate a random distribution for all input.

If you are referring to the second bullet point, the key phrases here are "as long as there is no modification to the object state" and "true only for the current execution of an application".

Also from the documentation,

A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object. Hash functions are usually specific to each Type and must use at least one of the instance fields as input. [Emphasis added is mine.]

As for the actual implementation, it clearly states that derived classes can defer to the Object.GetHashCode implementation if and only if that derived class defines value equality to be reference equality and the type is not a value type. In other words, the default implementation of Object.GetHashCode is going to be based on reference equality since there are no real instance fields to use and, therefore, does not guarantee unique return values for different objects. Otherwise, your implementation should be specific to your type and should use at least one of your instance fields. As an example, the implementation of String.GetHashCode returns identical hash codes for identical string values, so two String objects return the same hash code if they represent the same string value, and uses all the characters in the string to generate that hash value.

1 Comment

That was the most verbose and confusing answer I've ever read. It left me even more confused than what I began with.
4

Rules 1 & 3 aren't really a contradiction.

For a reference type the hash code is derived from a reference to the object - change an object's property and the reference is the same.

For value types the hash code is derived from the value, change a property of a value type and you get a completely new instance of the value type.

Comments

1

A very good explanation on how to handle GetHashCode (beyond Microsoft rules) is given in Eric Lipperts (co. Designer of C#) Blog with the article "Guidelines and rules for GetHashCode". It is not good practice to add hyperlinks here (since they can get invalid) but this one is worth it, and provided the information above one will probably still find it in case the hyperlink is lost.

Comments

0

By default it does it based on the reference to the object, but that means that it's the exact same object, so both would return the same hash. But a hash should be based on the value, like in the case of the string class. "a" and "b" would have a different hash, but "a" and "a" would return the same hash.

Comments

0

I can't know for sure how Object.GetHashCode is implemented in real .NET Framework, but in Rotor it uses SyncBlock index for the object as hashcode. There are some blog posts about it on the web, however most of them are from 2005.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.