0

I am using uthash.h for hash implementation in C. I am using the hash-table for a basic word count exercise. I have a file containing words and I have to count frequency of each word. The implementation of uthash.h requires me to generate an integer id for each entry, and I wanted to calculate a unique integer corresponding to each string. I tried using md5 hash algorithm, but it generates strings with digits and alphabets, so its no use.Can anybody suggest me such an algorithm.

5
  • A good implementation of the md5 hash should be able to give you the raw 16-byte array. Split this into 4 32bit integers and xor them together. That alphanumeric string is just a convent representation for displaying the hash. Commented Feb 20, 2015 at 21:43
  • See stackoverflow.com/questions/16521148/… and stackoverflow.com/questions/1010875/…. Commented Feb 20, 2015 at 21:44
  • @user1929959, the second link that you mentioned has hashing functions that return unsigned long values, but in uthash.h implementation the id needs to be integer. I am not wether this will work or not. I will try this approach and post my results once done. In the mean time if you have any more suggestion, please post them. Commented Feb 20, 2015 at 21:56
  • I head murmur3 is pretty good for strings. Commented Feb 20, 2015 at 22:18
  • And please don't use md5 or any other cryptographic hash function for this. Their computation is much slower than good non-cryptographic hash functions. Commented Feb 20, 2015 at 22:19

1 Answer 1

1

Use Robert Sedgewick's algorithm for hashing

unsigned int GenerateHash(char* str, unsigned int len) { unsigned int result = 0; unsigned int b = 378551; unsigned int a = 63689; unsigned int i = 0; for(i=0; i<len; str++, i++) { result = result*a + (*str); a = a*b; } return result; } 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.