1

I need a hash function for a string (of bytes) that

  1. Has a low collision ratio (even for short strings)

  2. Can be computed quickly (O(n) time is a must but I want it as fast as possible)

  3. Given hash(string1) and hash(string2), computing hash(append(string1, string2)) can be done in O(1).

The best I could come up with so far is this: (in Java pseudocode)

public static int[] HASH_ENTROPY = new int[] {...} // 255 large prime numbers public int hash() int hash = 0; for (int i=0; i < this.array.length; i++) hash += HASH_ENTROPY[this.array[i] + 128]; return hash; 

Are there any better algorithms? This one performs well with #1 and #3 but I'm wondering if it's too slow having to access the random elements in the array.

5
  • This is not dissimilar to RSA and other similar approaches, instead of just 'large' primes, you could use primes that are relatively prime to a lot of things (i.e. 3, 17, etc.) I know Ramanujan also came up with a very good number for hashing (can't think of it off the top of my head.) Commented Dec 8, 2013 at 2:32
  • That algorithm is order-independent, which means that anagrams will collide; that seems to contradict desire #1, although it's tricky to get #3 to be so fast with an order-dependent hash algorithm. I really don't think the speed of looking up the entropy values matters, but on the other hand, they're not helping you much either since. Commented Dec 8, 2013 at 3:01
  • @nikdeapen For what purpose u are using Hashing (security or lookup)? What is the hash key length needed? Commented Dec 8, 2013 at 6:02
  • @VikramBhat not security, just hashmap lookup Commented Dec 9, 2013 at 0:19
  • @rici anagrams happen so rarely in application that i'm not worried about them Commented Dec 9, 2013 at 0:21

2 Answers 2

1

I suggest you to use:

public uint32_t hash() uint32_t hash = 0x1f351f35; // 2x Barker code for (int i=0; i < this.array.length; i++) { char c = this.array[i]; hash = ((hash << 1) | (hash >> 31)) + (HASH_ENTROPY[(uint8_t)(hash + c)] ^ c); } return hash; 
Sign up to request clarification or add additional context in comments.

3 Comments

This works good but it is about 4-5 times slower than the one I gave. Is there any substantial benefit to using this function instead?
benefits - resistible to anagrams hash("ab") != hash("ba"); Snowcrash effect, when change 1 bit in the input data changes ~50% bits in output.
How you will do: Given hash(string1) and hash(string2), computing hash(append(string1, string2)) can be done in O(1). ?
0

Also, if you need quick computing time, you can consider another hash function:

public uint32_t hash() uint32_t hash = 0x1f351f35; // 2x Barker code for (int i=0; i < this.array.length; i++) { hash += (hash << 4) + this.array[i]; } return hash; 

Important: Previous hash function uses entropy array; you can populate this array by random values at each program start, so there will be universal hashing, resistant from external attack, when somebody outside especially generates many strings with same hash, for produce collision and DOS of your service. This function is quick, but isn't resistible for evil attack.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.