7

I understand that this seems to be already discussed and the answer is yes, String.hashCode can generate equal vales for different strings, but quite unlikely (Can Java's hashCode produce same value for different strings?). However it does happen in my application.

The following code will produce the same hashcode: -347019262 (jave 1.7.25)

String string1 = "/m/06qw_"; String string2="/m/0859_"; System.out.println(string1+","+string1.hashCode()); System.out.println(string2+","+string2.hashCode()); 

I do need hashcode in this case, and I want to use it to generate a unique primary key for a string. it seems that I am not doing it right. Any suggestions please?

Many thanks!

6
  • The hash code of 2 string can be same but not equals Commented Mar 11, 2014 at 10:52
  • You can try something with System.identityHashCode(). I think that it is supposed to be unique among the JVM (But I am not sure about that so I don't post it as an answer). Commented Mar 11, 2014 at 10:55
  • @ArnaudDenoyelle basically, this will return the result of Object's .hashCode() on any instance Commented Mar 11, 2014 at 10:55
  • Javadoc about Object.hashCode() says class Object does return distinct integers for distinct objects so calling System.identityHashCode() would work in a single JVM. Commented Mar 11, 2014 at 11:03
  • 2
    you cannot ensure unique hashCodes, nor do you need to, nor would it prevent collisions in a hash map for example. Unique hashCodes can still give you collisions. Commented Mar 11, 2014 at 11:11

5 Answers 5

13

You misunderstand .hashCode().

One part of the contract is that objects who are equals() must have the same hashCode(). However, the reverse is not true: two objects who have the same hashCode() do not have to be equals().

This is a valid, albeit perfectly useless, hashCode() implementation:

@Override public int hashCode() { return 42; // universal answer } 

You should use the string itself as the "primary key". If you want a "more efficient" key, you should consider what format the input string is and, if possible, extract a significant part of this input.

Sign up to request clarification or add additional context in comments.

Comments

4

The sensible option is to use the string as the primary key. (Another choice would be to associate a GUID with your data record and have that as the primary key.)

Hashing is meant to be (1) fast and (2) such that two equal strings will have the same hash code.

I'd submit it's likely that you'll get hashing clashes; after all an int (the hash return type) only has about 4 billion distinct values.

Comments

4

I do need hashcode in this case, and I want to use it to generate a unique primary key for a string. it seems that I am not doing it right. Any suggestions please?

You should always be cautious about using hash values primary keys. They are not unique. And the smaller the range of the hash function, the worse the problem is.

In your case, hashcode (and the identityHashcode() method suggested in a comment) generate a 32 bit value. For any pair of two different randomly generated strings, there is a chance of 1 in 2^32 that the hashcodes will be the same. This is true for any method of generating (32 bit) hash codes.

Now a chance of (roughly) 1 in 2 billion does not sound like much. But you don't need just pair-wise uniqueness. You actually need all of your strings' hashcodes to be unique ... because you are attempting to use the hashcodes as primary keys, and primary keys must be unique. And the table on the Wikipedia page "birthday problem" says that you only need roughly 50,000 keys before the probability of a collision rises to 1 in 4. (Yes ... ONE in FOUR!)

In short, don't use hashcode() values as primary keys.

The same table, indicates a good hash function that generated 128 bit hash values would probably be good enough to avoid collisions. But checkout the probabilities for yourself and make your own judgement.

Comments

2

You can use SHA1 hash algorithm to reduce collision probability. Take a look of this snippets to see, how to calculate SHA1 hash in Java: http://www.sha1-online.com/sha1-java/

1 Comment

Thank you all, I think I can make good use of all your comments.
2

You could use

System.identityHashcode(Object); 

to get the unique results.

EDIT

I thought that may be the murmur hash guava's implementation could help here also:

 HashFunction hash = Hashing.murmur3_128(); hash.hashString("/m/06qw_", Charset.defaultCharset()).asInt(); 

Generally murmur hash is supposed to be fast and reliable.

4 Comments

+1, but you could still get clashes: e.g. in an rmi environment.
Except that this will produce different results for "test" and new String("test")...
@fge agree. May be murmur hash could help also
Good one about using murmur, however Charset.defaultCharset() not so good: it depends on the JVM

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.