2

The history why is long, but the problem is simple. Having 3 strings I need to cache the matching value. To have a fast cache I use the following code:

public int keygen(string a, string b, string c) { var x = a + "@@" + b + "@@" + c; var hash = x.GetHashCode(); return hash; } 

(Note that string a,b,c does not contain the code "@@") The cache it self is just a Dictionary<int, object>

I know there is a risk that the hash key might be non unique, but except this:

Does anyone know a faster way to make an int key? (in C#) This operation takes ~15% of total CPU time and this is a long running app.

I have tried a couple of implementations but failed to find any faster.

6
  • Have you tried String.Concat(, String.Format( and StringBuilder yet? Commented Oct 8, 2013 at 14:25
  • 1
    Are you trying to produce a unique integer key? Hashing doesn't necessarily do this, even though it is most likely unique. Commented Oct 8, 2013 at 14:27
  • I'm afraid I wouldn't know of any way of boosting GetHashCode's performance, but I think it's possible you're going for some type of premature optimization here. Logic would dictate comparing two strings to be a time-consuming operation (hence you'd use hashes) but I think C# automatically does something like what you're attempting - so comparing the entire text of your question, to itself, should take the same time as comparing "hi" to "hi". I'd like confirmation of this from another poster or two though... Commented Oct 8, 2013 at 14:28
  • What sorts of lengths can we expect to see for the three strings? Commented Oct 8, 2013 at 14:46
  • Hashing speed has an inherent trade-off with hashing quality. The hash hash = (int)a[0]; is very fast, but also of very low quality. Commented Oct 8, 2013 at 15:36

6 Answers 6

5

You should use a Dictionary<Tuple<string,string,string>, object>. Then you don't have to worry about non-uniqueness, since the Dictionary will take care of it for you.

Sign up to request clarification or add additional context in comments.

6 Comments

This is the only solution that is not vulnerable to collisions. Compared to using hashkey of string as key in a Dictionary<int, object>
@TimSchmelter There can still be hash collisions using this implementation but they are handled (with equals check) internally in the dictionary.
@Magnus: I've deleted that comment because you're right. It uses the same algorithm as in J.Skeets famous GetHashCode implementation. So collisions are possible (although unlikely).
@TimSchmelter Did a test with a collection of 10.000 random anonymous objects and got collations on new { a = 1228545931, b = 694246649, c = 1459763042 }.GetHashCode() and new { a = 429549083, b = 16955719, c = 1653772628 }.GetHashCode() for example. So not totally unlikely.
@Magnus: I haven't said that it impossible but not very likely which is of course subjective. I have also checked it with a random collection but i needed 100000 random hashcodes to get a single collision. By the way, then i also got a collision with Tuple.GetHashCode.
|
4

Instead of concatenating strings (which creates new strings) you could use XOR or even better simple maths (credits to J.Skeet):

public int keygen(string a, string b, string c) { unchecked // Overflow is fine, just wrap { int hash = 17; hash = hash * 23 + a == null ? 0 : a.GetHashCode(); hash = hash * 23 + b == null ? 0 : b.GetHashCode(); hash = hash * 23 + c == null ? 0 : c.GetHashCode(); return hash; } } 

In general it's not necessary to produce unique hashs. But you should minimize collisions.

Another(not as efficient) way is to use an anonymous type which has a builtin support for GetHashCode:

public int keygen(string a, string b, string c) { return new { a, b, c }.GetHashCode(); } 

Note that the name, type and order matters for the calculation of the hashcode of an anonymous type.

7 Comments

What is special about the GetHashCode implementation of an anonymous type that it is able to avoid collisions?
@Lukazoid It doesn't avoid collisions entirely, it's just a type that has a sensibly written GetHashCode method that is based on the hashes of the underlying property values. It's likely to be more evenly distributed than a more naive implementation that you would write, and doesn't have as large of risks of making a mistake in the math that causes problems.
@Lukazoid: I've deleted that part since i guess that it also can produce collisions (f.e. if it overflows) since it uses the same algorithm as above.
@TimSchmelter It doesn't use exactly the same algorithm. It uses larger prime numbers and as such will be more likely to be evenly distributed over the range of an int. Of course there are obviously more possible values than there are integers, so avoiding collisions entirely is clearly impossible, they can only be minimized.
I think your return value on the second example should read return new[] { a, b, c }.GetHashCode();. Without the braces ([]), I get "Invalid anonymous type member declarator. Anonymous type members must be declared with a member assignment, simple name or member access."
|
3

A faster approach would be to compute the hash of each string separately, then combine them using a hash function. This will eliminate the string concatenation which could be taking time.

e.g.

public int KeyGen(string a, string b, string c) { var aHash = a.GetHashCode(); var bHash = b.GetHashCode(); var cHash = c.GetHashCode(); var hash = 36469; unchecked { hash = hash * 17 + aHash; hash = hash * 17 + bHash; hash = hash * 17 + cHash; } return hash; } 

4 Comments

Why is this faster than computing the hash code of one string?
The cost is in concatenating the 3 strings together to form one string. This eliminates that concatenation
I'm asking for proof or documentation that concatenating 3 strings and computing one hash code is faster that computing 3 hash codes and mathematically combining them. I don't know which way is faster, but I'd be shocked if the difference was significant.
@DStanley You are right, the performance difference is going to be largely dependent upon the sizes of the three strings. I will try and find some time to profile this and the original method.
1

I know there is a risk that the hash key might be non unique

Hash key's don't have to be unique - they just work better if collisions are minimized.

That said, 15% of your time spent computing a string's hash code seems VERY high. Even switching to string.Concat() (which the compiler may do for you anyways) or StringBuilder shouldn't make that much difference. I'd suggest triple-checking your measurements.

1 Comment

Well I do understand your comment, but sometimes u just have to port code. I am forced to make multiple lookups from code into files, and the cache simply cache values from the files.
1

I’d guess most of the time of this function is spent on building the concatenated string, only to call GetHashCode on it. I would try something like

public int keygen(string a, string b, string c) { return a.GetHashCode() ^ b.GetHashCode() ^ c.GetHashCode(); } 

Or possibly use something more complicated than a simple XOR. However, be aware that GetHashCode is not a cryptographic hash function! It is a hash function used for hash tables, not for cryptography, and you should definitely not use it for anything security-related like keys (as your keygen name hints).

5 Comments

He mentioned he is using it as a key for a dictionary (which has its own problems), not as a cryptographic key.
Hi I did try your code with a small modify. Since the order is of importance menaing (a,b,c) != (c,b,a) I changed the code with adding a random fixed number to each XOR term.
public int keygen(string a, string b, string c) { var x = (0xFC12341 + a.GetHashCode()) ^ (0xFC1252 + b.GetHashCode()) ^ (0xEA1454 + c.GetHashCode()); return x; }
This version did pass a long test with the same result as the old one I had. It is around 3-4 times faster. And VS does not highlight the string concat anymore :) So big THANKS!
There was an error in this. After some longer tests the new keygen failed. Testing with a="mort_basis_tbl" b = "makeham_a" c = "3" it gives the same key as a="mort_basis_tbl" b = "makeham_b" c = "0". So I am back with the orginal formula.
1

If you are one .Net 5 or later, a cleaner alternative is to use the built-in HashCode.Combine method.

This would lead to the following version of the original question:

public int keygen(string a, string b, string c) { return HashCode.Combine(a,b,c); } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.