7

I have a list of filepaths and want all these files to store as sha1 encoded hash in a list again. It should be as general as possible, so the files could be text as well as binary files. And now my questions are:

  1. What packages should be used and why?
  2. How consistent is the approach? With that I mean: if there could be different results with different programs using sha1 for encoding itself (e.g. sha1sum)
1
  • 1
    I can't judge the quality of the implementations, but there are several implementations of SHA1 in packages on hackage (section Cryptography). By the definition of SHA1, it works on the bytes of the file, so whether it's text or binary doesn't matter, and all correct implementations give the same result for the same file. Commented Feb 29, 2012 at 16:36

2 Answers 2

19

The cryptohash package is probably the simplest to use. Just read your input into a lazy1 ByteString and use the hashlazy function to get a ByteString with the resulting hash. Here's a small sample program which you can use to compare the output with that of sha1sum.

import Crypto.Hash.SHA1 (hashlazy) import qualified Data.ByteString as Strict import qualified Data.ByteString.Lazy as Lazy import System.Process (system) import Text.Printf (printf) hashFile :: FilePath -> IO Strict.ByteString hashFile = fmap hashlazy . Lazy.readFile toHex :: Strict.ByteString -> String toHex bytes = Strict.unpack bytes >>= printf "%02x" test :: FilePath -> IO () test path = do hashFile path >>= putStrLn . toHex system $ "sha1sum " ++ path return () 

Since this reads plain bytes, not characters, there should be no encoding issues and it should always give the same result as sha1sum:

> test "/usr/share/dict/words" d6e483cb67d6de3b8cfe8f4952eb55453bb99116 d6e483cb67d6de3b8cfe8f4952eb55453bb99116 /usr/share/dict/words 

This also works for any of the hashes supported by the cryptohash package. Just change the import to e.g. Crypto.Hash.SHA256 to use a different hash.

1 Using lazy ByteStrings avoids loading the entire file into memory at once, which is important when working with large files.

Sign up to request clarification or add additional context in comments.

Comments

0

As to @hammar's answer, it is excellent but you can use Base16 library instead of making your own toHex.

import qualified Data.ByteString.Base16 as B16 hashFile path >>= putStrLn . B16.encode 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.