6

I have looked through several topics about calculating checksums of files in Python but none of them answered the question about one sum from multiple files. I have several files in sub directories and would like to determine if there was any change in one or more of them. Is there a way to generate one sum from multiple files?

EDIT: This is the way I do it to get a list of sums:

checksums = [(fname, hashlib.md5(open(fname, 'rb').read()).digest()) for fname in flist] 
4
  • 2
    Sure! Using hashlib, simply call the hash object's .update method with the bytes of each file. But why bother? Simply hash each file separately, and see if any of the hashes have changed. That way, you also get the identity of which file(s) changed. But if you really want a multi-file hashing program, try writing it and if you get stuck post your code and I'll be happy to help. Commented Jan 15, 2016 at 9:39
  • FWIW, here's some Python 2 code I wrote for U&L that does simultaneous MD5 and SHA-256 digests of a file. It process the file in blocks so it can handle files that are too big to fit in memory. Commented Jan 15, 2016 at 9:39
  • Thank for your input! I put the code for multiple lines. I assume I can use .update() instead of .digest() but I am not sure how. Do you mean calc hash for the first file like this: hash_obj = hashlib.md5(open(fname, 'rb').read()) and after that do hash_obj.update(fname)? Will it calc hash from file contents or just filename string? Commented Jan 19, 2016 at 14:52
  • Yes, you need to use the .update method to supply extra data to the hashlib object. The .digest and .hexdigest methods are simply output methods that give the digest of the data that's been fed so far to the hashlib object. I don't have time write now to go into further details or write any code. But I recommend that you don't try to do this all in a one-line list comprehension: it might save a tiny bit of time but it makes the code hard to work with and hard to read. Commented Jan 19, 2016 at 15:01

3 Answers 3

10

Slightly cleaner than Artur's answer. There's no need to treat the first element specially.

Edit (2022): I know Python a bit better now so I updated the code as follows:

  • Use pathlib - it's more ergonomic and doesn't leave files open.
  • Add type hints. If you don't use these you're doing it wrong.
  • Avoid a very mild TOCTOU issue.
import hashlib from pathlib import Path def calculate_checksum(filenames: list[str]) -> bytes: hash = hashlib.md5() for fn in filenames: try: hash.update(Path(fn).read_bytes()) except IsADirectoryError: pass return hash.digest() 

(You can handle IsADirectoryError differently if you like.)

Sign up to request clarification or add additional context in comments.

Comments

1

So I made it :) This way one hash sum is generated for a file list.

hash_obj = hashlib.md5(open(flist[0], 'rb').read()) for fname in flist[1:]: hash_obj.update(open(fname, 'rb').read()) checksum = hash_obj.digest() 

Thank you PM 2Ring for your input!

Note that md5 has been cracked so use it only for non security critical purposes.

2 Comments

Don't you need to close the files you're opening?
Fair point, probably the best way to do this would be to use open as a context manager. This is an old answer I wrote back when I was using Python 2 (and sure, this should include closing a file manually).
-1
import subprocess cmd =input("Enter the command : ") trial = subprocess.run(["powershell","-Command",cmd]) #Powershell command : Get-FileHash -Algorithm MD5 -Path (Get-ChildItem "filepath\*.*" -Recurse -force) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.