Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

4
  • 3
    I've used duff, fdupes and rmlint, and strongly recommend readers to look at the third of these. It has an excellent option set (and documentation). With it, I was able to avoid a lot of the post-processing I needed to use with the other tools. Commented Sep 2, 2015 at 6:32
  • 4
    In my practice filename is the least reliable factor to look at, and I've completely removed it from any efforts I make a de-duping. How many install.sh files can be found on an active system? I can't count the number of times I've saved a file and had name clash, with some on-the-fly renaming to save it. Flip side: no idea how many times I've downloaded something from different sources, on different days, only to find they are the same file with different names. (Which also kills the timestamp reliability.) 1: Size, 2: Digest, 3: Byte contents. Commented Jan 28, 2017 at 6:40
  • @GypsySpellweaver: (1) depends on personal use-case, wouldn't you agree? In my case, i have multiple restores from multiple backups, where files with same name and content exist in different restore-folders. (2) Your comment seems to assume comparing filename only. I was not suggesting to eliminate other checks. Commented Mar 8, 2017 at 21:50
  • 1
    Don't use rmlint. It emits a shellscript that doesn't check for errors. It deletes the duplicate then attempts to hardlink which can fail (e.g. due to too many links) and then continues to work on the next duplicate which suffers the same fate. It eats your files. The correct way would be to create it under a temporary name and then replace the dupe via rename. Commented Oct 7, 2020 at 10:01