4

I have a curious problem and I am brainstorming possible solutions. The problem is such: I have a number of inputs (up to several thousand different ones), which basically differ in two-three arrays (arrays are a different size generaly, from size one up to couple thousand elements long). the functions which process arrays take some time to initialize data, so i thought to cache function/functor together with data and store them in map.

now, how do i go about converting raw arrays into usable hashtable type? i initially thought to read array into a string and use string as the key. is it good idea? do you have better suggestion?

4
  • Do you have a need to look up specific keys? I'm not sure what using a hash table is buying you here if you're just going to process one dataset after the next, which is what it sounds like. Commented Feb 18, 2010 at 2:14
  • @Joe yes, I do need to lookup keys. same data is processed using same function 10 or more times. Commented Feb 18, 2010 at 2:18
  • But how do you know which set you need to process again? What I'm not understanding from your description is how do you know in your logic which dataset you need to retrieve again? Commented Feb 18, 2010 at 2:43
  • Which grows faster, the number of elements or the size of the keys? If you hash the entire key, then computing the hash of a key is O(key size), which may be significant. If you don't hash the entire key, then keys that differ only at the end will collide. Commented Feb 18, 2010 at 3:35

1 Answer 1

4

Are these arrays integer? If yes, just go with something like this

hash = (hash + (324723947 + a[i])) ^93485734985;

Similar thing would work fine for strings if you do it on all characters. Finally, you may check out extra libs here

Sign up to request clarification or add additional context in comments.

4 Comments

double precision, so technically they can be integers.
Then it would work perfectly. I would still hash them as integers.
this hash function, what is the collision rate? I am a bit worried about that
For float numbers it is going to be nearly perfectly random. So collision rate would depend solely on your hash table size.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.