3

I'm implementing a custom GetHashCode for the System.Drawing.Point class in C#. My method currently fails the following requirement:

var hashA = MyGetHashCode(new Point(1, 0)); var hashB = MyGetHashCode(new Point(0, 1)); var hashC = MyGetHashCode(new Point(0, 0)); var hashD = MyGetHashCode(new Point(1, 1)); Assert.AreNotEqual(hashA ^ hashB, hashC ^ hashD); 

To pass this test I'm sure that using new SHA256Managed().ComputeHash(currentHash) will do. But there is any other faster hashing algorithm? I know SHA256 is all about security, and I don't need that.

3
  • 1
    How did you come up with the idea that your hash function should pass that test? Commented Jun 8, 2009 at 13:08
  • @mquander Surely seems strange. But some other class Equals functions, relies on a simple GetHashCode implementation, which in its turn relies on my custom Point.GetHashCode method Commented Jun 8, 2009 at 13:16
  • @mquander It's all about not repeating code in Equals and GetHashCode, and making them equivalent. Commented Jun 8, 2009 at 13:19

7 Answers 7

6

A simple hash? how about something like:

 (17 * point.X) + (23 * point.Y); 

Or for more obvious entropy:

int hash = -1047578147; hash = (hash * -1521134295) + point.X; hash = (hash * -1521134295) + point.Y; 

(numbers from C#'s anonymous type code)

Sign up to request clarification or add additional context in comments.

3 Comments

Marc, this will surely fulfill the Assert but is not bounded (large X or Y will overflow)... if you allow overflow then it doesn't have a good distribution
It will wrap (unchecked); the distribution is, AFAIK, fine... This is the approach used by the C# compiler ;-p
Those numbers are just (as stated) the numbers that the C# compiler will use for an anon type of the form: new {X = 123, Y = 456}
3
  • Why are you doing this? Surely System.Drawing.Point has a fine hashing function already?

  • You understand that test doesn't represent a strict requirement, right? Hash codes don't have to be unique.

  • If you really want a really good hash of the coordinates in question, you might want to start at this page about hashing multiple integers.

3 Comments

Re "fine hashing function" - x ^ y... that isn't great; it means that anything on the diagonal is zero, and anything symmetric, i.e (5,7) and (7,5) - is equal.
It's not great, but unless you have a fairly pathological distribution of points, it's OK. I have the feeling the OP isn't working off any concrete performance requirement, if he's considering using an SHA hash, so I doubt anything better is required.
I answered the first question in the comments of the question.
1

I know this isn't going to answer your question, but I must mention for the sake of other readers that you are changing the default behavior of a built in method of the framework. As per the documentation:
http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx

The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.

4 Comments

Could you please use a blockquote, not a code block (prefix the quote with ">" instead of indenting)? It'll make it much easier to read.
That is no problem in this case. What they mean is that you should not use the default GetHashcode as a unique value, because different objects can return the same hash. When Jader implements it as a (practically) unique value (by using sha256 or whatever) it will not break anything. It will be just slow ...
To be fair, it will be quite slow if he actually has a big hash table of points -- slow enough that it's pretty ridiculous.
Yeah sure - it is not really clever and he should never use a sha256 or something similar for this purpose. But the statement by Joel is not right, nevertheless.
1

Here's an interesting article about hashing speed:

A Hash Function for Hash Table Lookup

Comments

1

A simple Elf hash implementation (it's in delphi, shoudl be easy to translate)

function ElfHash(id : string; tableSize : integer) : integer; var i : integer; h,x : longint; begin h := 0; // Obtener el valor numérico for i := 1 to Length(id) do begin h := (h shl 4) + Ord(id[i]); x := h and $F0000000; if x <;>; 0 then h = h xor (x shr 24) xor x; end; // Ajustar al tamaño de la tabla result := h mod tableSize; end; 

3 Comments

I thought I knew delphi, but I have no idea what <;>; means
:) Talk to the stackoverflow sanitizer code... That's ovbiously "<>"
Says "Easy to translate" while asking to do shift left, shift right, and exclusive or in managed code.
0

I don't know what your application is, but you may be looking for Zobrist hashing.

http://en.wikipedia.org/wiki/Zobrist_hashing

It can be updated incrementally, which makes it very fast.

Comments

0

If you know in advance that your point value is between 0 and N, you can use hashcode = X+Y*N; This is a rather obvious possible hash. It's not random at all, has ugly repetition, and is generally pretty silly. It's equivalent to concatenating the bits of your two points, assuming N is a power of 2. And it's got perfect uniform distribution and no collisions.

I've used this strategy to excellent effect in the past, but admit it has some real (but obvious) limitations. The biggest one being what happens when N is large enough that N^2 won't fit into your hash value (i.e. painful collisions.

1 Comment

My current implementation is ((x << 16) | (x >> 16)) ^ y (in C#), that fits in your description

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.