4

I'm trying to get the hash of a file as fast as possible. I have a program that hashes large sets of data (100GB+) consisting of random file sizes (anywhere from a few KB up to 5GB+ per file) across anywhere between a handful of files up to several hundred thousand files.

The program must support all Java supported algorithms (MD2, MD5, SHA-1, SHA-256, SHA-384, SHA-512).

Currently I use:

/** * Gets Hash of file. * * @param file String path + filename of file to get hash. * @param hashAlgo Hash algorithm to use. <br/> * Supported algorithms are: <br/> * MD2, MD5 <br/> * SHA-1 <br/> * SHA-256, SHA-384, SHA-512 * @return String value of hash. (Variable length dependent on hash algorithm used) * @throws IOException If file is invalid. * @throws HashTypeException If no supported or valid hash algorithm was found. */ public String getHash(String file, String hashAlgo) throws IOException, HashTypeException { StringBuffer hexString = null; try { MessageDigest md = MessageDigest.getInstance(validateHashType(hashAlgo)); FileInputStream fis = new FileInputStream(file); byte[] dataBytes = new byte[1024]; int nread = 0; while ((nread = fis.read(dataBytes)) != -1) { md.update(dataBytes, 0, nread); } fis.close(); byte[] mdbytes = md.digest(); hexString = new StringBuffer(); for (int i = 0; i < mdbytes.length; i++) { hexString.append(Integer.toHexString((0xFF & mdbytes[i]))); } return hexString.toString(); } catch (NoSuchAlgorithmException | HashTypeException e) { throw new HashTypeException("Unsuppored Hash Algorithm.", e); } } 

Is there a more optimized way to go about getting a files hash? I'm looking for extreme performance and am not sure if I have gone about this the best way.

1
  • 3
    Have you profiled the code? Where does it spend most of its time? Commented Apr 10, 2013 at 17:49

2 Answers 2

5

I see a number of potential performance improvements. One is to use StringBuilder instead of StringBuffer; it's source-compatible but more performant because it's unsynchronized. A second (much more important) would be to use FileChannel and the java.nio API instead of FileInputStream -- or at least, wrap the FileInputStream in a BufferedInputStream to optimize the I/O.

Sign up to request clarification or add additional context in comments.

Comments

1

In addition to Ernest's answer :- MessageDigest.getInstance(validateHashType(hashAlgo)) I think this can be cached in a thread local hashmap with validateHashType(hashAlgo) as the key. Making MessageDigest takes time but you can reuse them : by calling the reset() method at the start after getting instance from Map.

See the javadoc of java.lang.ThreadLocal

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.