1

I have a photo library that contains a lot of duplicate images. Unfortunately these sometimes have different names, because they also come from my wifes phone where the numbering is different.

On IOS there is something called live photos, where a .MOV file has the same basename as the image. My photo library shows these the same way IOS does (hold to play). Ofcourse i would like to keep this feature.

So what i would like is to scan for duplicate files based on hash, then make sure, if a .MOV with the same basename exists, that file is kept and the others are deleted.

-a--- 10-8-2013 19:36 2909610 IMG_0992 (1).JPG -a--- 10-8-2013 19:36 2909610 IMG_0992 (2).JPG -a--- 10-8-2013 19:36 2909610 IMG_0992.JPG -a--- 14-7-2013 16:30 30972837 IMG_0992.MOV 

So in this example IMG_0992.JPG should be kept, the others deleted.

# start scanning here: # (default to personal documents folder) # use any other path if you like: # i.e.: $Path = 'c:\windows' $Path = "D:\Test" # get a hashtable of all files of size greater 0 # grouped by their length: $group = Get-ChildItem -Path $Path -File -Recurse -ErrorAction Ignore | # EXCLUDE empty files... Where-Object Length -gt 0 | # group them by their LENGTH... Group-Object -Property Length -AsHashTable # take each pile in the hashtable (grouped by their length) # and return all files from piles greater than one element: $candidates = foreach($pile in $group.Values) { # are there at least 2 files in this pile? if ($pile.Count -gt 1) { # yes, add it to the candidates $pile } } # these are files that CAN have duplicates and require more # testing: $candidates $duplicates = $candidates | # group all files by their hash, placing files with equal content # in the same group Group-Object -Property { (Get-FileHash -Path $_.FullName -Algorithm SHA1).Hash } -AsHashTable -AsString $duplicates 

Don't know how to go from here.

2
  • You say, "if a .MOV with the same basename exists, that file is kept and the others are deleted." But then you show an example that includes a .MOV file and say that the ..JPG will be kept. What am I missing? Commented Jan 1 at 18:31
  • Keep the JPG that has a MOV file with the same name. Commented Jan 2 at 16:20

2 Answers 2

1

"So what i would like is to scan for duplicate files based on hash, then make sure, if a .MOV with the same basename exists, that file is kept and the others are deleted."

I would examine only the files that do need to be checked, because of an existing MOV, so removing from the list the JPG-files that also have a MOV:

$jpgs = Get-ChildItem *.JPG $movs = Get-ChildItem *.MOV $jpgs | Where-Object { $_.BaseName -notin $movs.BaseName } 

This produces (based on the 4 files in your question):

IMG_0992 (1).JPG IMG_0992 (2).JPG 

Based on and answer to: Array subtraction in PowerShell

Sign up to request clarification or add additional context in comments.

3 Comments

It might be helpful to add | Remove-Item -WhatIf.
+1 for a clever simplification, but note that, given the OP's use of -Recurse, i.e. processing of an entire directory subtree, you solution at least hypothetically may result in false positives. Also - and this may or may not matter in practice - the array "subtraction" method you're using is inefficient.
Good thinking, that speeds up the process.
1

@Luuk's answer is a great starting point

Here my commented alternative, which I have no doubt could be improved.
the Remove-Item uses the -WhatIf parameter for safety.
If it works as intented, remember to remove it.

# Let's get all the MOV files. $MovieList = Get-ChildItem -File -Filter '*.mov' | # But we only need the base names without the extension. Select-Object -ExpandProperty BaseName # Let's get all the photos. $PhotoGroupArray = Get-ChildItem -File -Filter '*.jpg' | # Hashing them. Get-FileHash | # Grouping them by Hash, so 1 Hash : N Photos Group-Object -Property Hash | # Skipping Hashes with just 1 photo: they have no copies we need to remove. Where-Object { $_.Group.Count -gt 1 } | # Simplifying the resulting object by removing unneeded properties, though it's optional. Select-Object -Property @{Name = 'FileName'; Expression = { $_.Group.Path } } # For each group of photos. foreach ($PhotoGroup in $PhotoGroupArray) { # Check if any photo name is the same as any movie name, except for the extension of course. # If not, adds the photo to the array of files to remove. $RemoveList = foreach ($Photo in $PhotoGroup.FileName) { if ([System.IO.Path]::GetFileNameWithoutExtension($Photo) -notin $MovieList) { $Photo } } # If the number of photo to remove is the same as the full array of photos, it means there was no movie with the same name. if ($RemoveList.Count -eq $PhotoGroup.FileName.count) { # In this case remove every photo except the shortest name one. $RemoveList = $RemoveList | Sort-Object -Property Length -Descending -Top ($RemoveList.count - 1) } # Remove the compiled list. $RemoveList | Remove-Item -WhatIf } #> 

2 Comments

"but it also deletes photos without a matching movie file", But I only said something about "examine only the files...."
@Luuk sorry, I'm dumb. Edited my answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.