Hash Function for Strings

Question

I need a hash function for a string (of bytes) that

Has a low collision ratio (even for short strings)
Can be computed quickly (O(n) time is a must but I want it as fast as possible)
Given hash(string1) and hash(string2), computing hash(append(string1, string2)) can be done in O(1).

The best I could come up with so far is this: (in Java pseudocode)

public static int[] HASH_ENTROPY = new int[] {...} // 255 large prime numbers public int hash() int hash = 0; for (int i=0; i < this.array.length; i++) hash += HASH_ENTROPY[this.array[i] + 128]; return hash;

Are there any better algorithms? This one performs well with #1 and #3 but I'm wondering if it's too slow having to access the random elements in the array.

This is not dissimilar to RSA and other similar approaches, instead of just 'large' primes, you could use primes that are relatively prime to a lot of things (i.e. 3, 17, etc.) I know Ramanujan also came up with a very good number for hashing (can't think of it off the top of my head.) — Dylan Lawrence
– Dylan Lawrence, Commented Dec 8, 2013 at 2:32
That algorithm is order-independent, which means that anagrams will collide; that seems to contradict desire #1, although it's tricky to get #3 to be so fast with an order-dependent hash algorithm. I really don't think the speed of looking up the entropy values matters, but on the other hand, they're not helping you much either since. — rici
– rici, Commented Dec 8, 2013 at 3:01
@nikdeapen For what purpose u are using Hashing (security or lookup)? What is the hash key length needed? — Vikram Bhat
– Vikram Bhat, Commented Dec 8, 2013 at 6:02
@rici anagrams happen so rarely in application that i'm not worried about them — nikdeapen
– nikdeapen, Commented Dec 9, 2013 at 0:21

olegarch · Accepted Answer · 2013-12-08 05:38:39Z

1

I suggest you to use:

public uint32_t hash() uint32_t hash = 0x1f351f35; // 2x Barker code for (int i=0; i < this.array.length; i++) { char c = this.array[i]; hash = ((hash << 1) | (hash >> 31)) + (HASH_ENTROPY[(uint8_t)(hash + c)] ^ c); } return hash;

answered Dec 8, 2013 at 5:38

olegarch

3,9111 gold badge22 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

nikdeapen Over a year ago

This works good but it is about 4-5 times slower than the one I gave. Is there any substantial benefit to using this function instead?

olegarch Over a year ago

benefits - resistible to anagrams hash("ab") != hash("ba"); Snowcrash effect, when change 1 bit in the input data changes ~50% bits in output.

Hynek -Pichi- Vychodil Over a year ago

How you will do: Given hash(string1) and hash(string2), computing hash(append(string1, string2)) can be done in O(1). ?

olegarch · Accepted Answer · 2013-12-09 16:00:28Z

Also, if you need quick computing time, you can consider another hash function:

public uint32_t hash() uint32_t hash = 0x1f351f35; // 2x Barker code for (int i=0; i < this.array.length; i++) { hash += (hash << 4) + this.array[i]; } return hash;

Important: Previous hash function uses entropy array; you can populate this array by random values at each program start, so there will be universal hashing, resistant from external attack, when somebody outside especially generates many strings with same hash, for produce collision and DOS of your service. This function is quick, but isn't resistible for evil attack.

Collectives™ on Stack Overflow

Hash Function for Strings

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Related