7

What is the best way to calculate hash code based on values of these string in one pass?

With good I mean that it needs to be:

1 - fast: I need to get hash code for huge list (10^3..10^8 items) of short strings.

2 - identify the whole list of data so many list with maybe only couple of different strings must have different hash codes

How to do it in Java?

Maybe there is a way to use existing string hash code, but how to merge many hash codes calculated for separate strings?

Thank you.

6
  • 1
    What do you want the hash code for? Do you just want one hash, or one for each string? Commented Feb 1, 2013 at 1:53
  • Do you want hash code values like java already has hashCode() method on String which returns an int or, do you want hash values like MD5 digest? Commented Feb 1, 2013 at 2:03
  • Why not use the inbuilt hashCode() method? List implementations that extends AbstractList do count its value from the hash codes of its elements. Commented Feb 1, 2013 at 2:06
  • Must the hash code be order-sensitive? Ie should the hash code for {"a", "b", "c"} be the same or different than the hash code for {"a", "c", "b"}? Commented Feb 1, 2013 at 2:32
  • This question is way too ambiguous .... and the OP has not clarified it. Time to close it ... Commented Feb 1, 2013 at 3:09

1 Answer 1

10

create a placeholder class for you strings and then use CRC32 class. its simple and fast:

import java.util.zip.CRC32; public class HugeStringCollection { private Collection<String> strings; public HugeStringCollection(Collection<String> strings) { this.strings = strings; } public int hashCode() { CRC32 crc = new CRC32(); for(String string : strings) { crc.update(string.getBytes()) } return (int)( crc.getValue() ); } } 

if the collection itself is immutable, you can compute the hash once and store it for lates reuse.

Sign up to request clarification or add additional context in comments.

3 Comments

it has been widely used in file processing for years, e.g. in ZIP compression
@mantrid how do you convert this to work for an arraylist of Characters? as I guess we don't have getBytes for character!?
String.join(myArrayList).getBytes() I think.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.