I have a photo library that contains a lot of duplicate images. Unfortunately these sometimes have different names, because they also come from my wifes phone where the numbering is different.
On IOS there is something called live photos, where a .MOV file has the same basename as the image. My photo library shows these the same way IOS does (hold to play). Ofcourse i would like to keep this feature.
So what i would like is to scan for duplicate files based on hash, then make sure, if a .MOV with the same basename exists, that file is kept and the others are deleted.
-a--- 10-8-2013 19:36 2909610 IMG_0992 (1).JPG -a--- 10-8-2013 19:36 2909610 IMG_0992 (2).JPG -a--- 10-8-2013 19:36 2909610 IMG_0992.JPG -a--- 14-7-2013 16:30 30972837 IMG_0992.MOV So in this example IMG_0992.JPG should be kept, the others deleted.
# start scanning here: # (default to personal documents folder) # use any other path if you like: # i.e.: $Path = 'c:\windows' $Path = "D:\Test" # get a hashtable of all files of size greater 0 # grouped by their length: $group = Get-ChildItem -Path $Path -File -Recurse -ErrorAction Ignore | # EXCLUDE empty files... Where-Object Length -gt 0 | # group them by their LENGTH... Group-Object -Property Length -AsHashTable # take each pile in the hashtable (grouped by their length) # and return all files from piles greater than one element: $candidates = foreach($pile in $group.Values) { # are there at least 2 files in this pile? if ($pile.Count -gt 1) { # yes, add it to the candidates $pile } } # these are files that CAN have duplicates and require more # testing: $candidates $duplicates = $candidates | # group all files by their hash, placing files with equal content # in the same group Group-Object -Property { (Get-FileHash -Path $_.FullName -Algorithm SHA1).Hash } -AsHashTable -AsString $duplicates Don't know how to go from here.