SHA1 encoding in Haskell

Question

I have a list of filepaths and want all these files to store as sha1 encoded hash in a list again. It should be as general as possible, so the files could be text as well as binary files. And now my questions are:

What packages should be used and why?
How consistent is the approach? With that I mean: if there could be different results with different programs using sha1 for encoding itself (e.g. sha1sum)

I can't judge the quality of the implementations, but there are several implementations of SHA1 in packages on hackage (section Cryptography). By the definition of SHA1, it works on the bytes of the file, so whether it's text or binary doesn't matter, and all correct implementations give the same result for the same file. — Daniel Fischer
– Daniel Fischer, Commented Feb 29, 2012 at 16:36

hammar · Accepted Answer · 2012-02-29 17:06:47Z

The cryptohash package is probably the simplest to use. Just read your input into a lazy¹ ByteString and use the hashlazy function to get a ByteString with the resulting hash. Here's a small sample program which you can use to compare the output with that of sha1sum.

import Crypto.Hash.SHA1 (hashlazy) import qualified Data.ByteString as Strict import qualified Data.ByteString.Lazy as Lazy import System.Process (system) import Text.Printf (printf) hashFile :: FilePath -> IO Strict.ByteString hashFile = fmap hashlazy . Lazy.readFile toHex :: Strict.ByteString -> String toHex bytes = Strict.unpack bytes >>= printf "%02x" test :: FilePath -> IO () test path = do hashFile path >>= putStrLn . toHex system $ "sha1sum " ++ path return ()

Since this reads plain bytes, not characters, there should be no encoding issues and it should always give the same result as sha1sum:

> test "/usr/share/dict/words" d6e483cb67d6de3b8cfe8f4952eb55453bb99116 d6e483cb67d6de3b8cfe8f4952eb55453bb99116 /usr/share/dict/words

This also works for any of the hashes supported by the cryptohash package. Just change the import to e.g. Crypto.Hash.SHA256 to use a different hash.

_{¹ Using lazy ByteStrings avoids loading the entire file into memory at once, which is important when working with large files.}

Ingun전인건 · Accepted Answer · 2019-12-20 04:04:00Z

As to @hammar's answer, it is excellent but you can use Base16 library instead of making your own toHex.

import qualified Data.ByteString.Base16 as B16 hashFile path >>= putStrLn . B16.encode

Collectives™ on Stack Overflow

SHA1 encoding in Haskell

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related