0

I created a function that copies a file from dir A to B and compares both checksums before removing A.

Now that I've reinvented the wheel. I want to know how I could have done this better. Instead of implementing a new safe_copy() with shutil and hashlib.

  • Are there already libraries doing it in python?
  • Are there already Windows built-ins?
  • Anything built-in anaconda ?

Info:

  • I can't install 3rd party code, I am working on an offline server.
  • Performance is not an issue
  • Paths to files I must copy are given in a pandas DataFrame(origin, destination)

This question is not about performance per say (but that's always a good point to bring up), it's about code reuse in general.

3
  • 1
    Why are you trying to do this with a Python script, when there's hundreds of command line utilities out there that would solve this problem? It would seem you're still hellbent on reinventing the wheel, like you say? Commented Nov 10, 2020 at 10:12
  • I ask because I'm still a young and foolish dev. I've got barely a year and during that year I didn't get much advices from my elders (too much work). What would you recommend me to use? I update the question... ;) Commented Nov 10, 2020 at 11:06
  • 1
    A good alternative is using the Python library 'dirsync' Commented Dec 29, 2022 at 18:12

1 Answer 1

1

For any starting programmer, it definitely makes sense to dive deeply into something that interests you - if, in your case, that's file management, that's fine of course. Just keep in mind that Python is not at all an optimal language for something that ultimately relies heavily on performance. A language like C++ or Rust might make more sense to learn if that's your passion.

If you do want to continue developing this in Python regardless, you should definitely read through the standard modules os, shutil, pathlib and hashlib. The program you described could be as simple as:

from pathlib import Path from shutil import copyfile from hashlib import md5 from os import remove def file_md5(fname): chunk_size = 16384 # arbitrary md5_hash = md5() with open(fname, 'rb') as f: for chunk in iter(lambda: f.read(chunk_size), b''): md5_hash.update(chunk) return md5_hash.hexdigest() a = 'C:\temp\a.txt' b = 'C:\temp\b.txt' if Path(b).is_file(): print('that file already exists!') exit(1) else: copyfile(a, b) if file_md5(a) != file_md5(b): print('something is not the same') else: remove(a) 

(Don't just run this script if you have an actual C:\temp\a.txt file, obviously)

There exist thousands of file management utilities that have been developed for decades and are highly optimised, for speed or for very specific functions. In almost any real world project, it would much more sense to combine/package several of those and script them together using a batch language (or perhaps Python) than to rewrite them from the ground up.

Rewriting can make sense to learn more about how they work internally, but you'll likely find yourself abandoning the work once you understand them. Another reason to rewrite could be because you have a clever idea on how to do it better, but that's where other languages are almost guaranteed to outperform Python.

Follow-up to comment: there's no single utility in Windows that does a 'safe-copy' in one go, that I'm aware of. I think that's mainly because you can pretty much rely on utilities like robocopy (standard Windows) to fail if there's a problem and to rest assured your copy is good if it completes without error.

However, I can appreciate wanting to be extra sure, so it would be fairly simple to string something like robocopy together with cmdlets like Get-FileHash from PowerShell. PowerShell is a standard part of Windows as well and writing a .ps1 script is not a lot more complicated than writing a batch file. A simple "copy this file, get and compare filehas and remove the appropriate file based on the outcome"-PowerShell script would be only a few lines, no installation required.

Sign up to request clarification or add additional context in comments.

2 Comments

I reformatted the question, my apologies for being messy. If you know of a (windows/python/anything else) built-in command to do a safe_copy (bonus point if we can choose the hash), I'll wait to let you edit to your answer for completeness's sake. But I'm already going to accept it. Thank you very much for your advices.
I've added a few remarks about robocopy and Get-FileHash in PowerShell.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.