3

In Linux in a folder, without subfolders, there are many files like this scheme.

I list them with ls -1.

1yBWVnZCx8CoPrGIG.part01.rar 1yBWVnZCx8CoPrGIG.part02.rar 1yBWVnZCx8CoPrGIG.part03.rar 1yBWVnZCx8CoPrGIG.part04.rar 1yBWVnZCx8CoPrGIG.part05.rar 1yBWVnZCx8CoPrGIG.part06.rar 1yBWVnZCx8CoPrGIG.part07.rar 1yBWVnZCx8CoPrGIG.part08.rar 1yBWVnZCx8CoPrGIG.part09.rar 1yBWVnZCx8CoPrGIG.part10.rar 1yBWVnZCx8CoPrGIG.part11.rar 1yBWVnZCx8CoPrGIG.part12.rar 1yBWVnZCx8CoPrGIG.part13.rar 1yBWVnZCx8CoPrGIG.part14.rar 1yBWVnZCx8CoPrGIG.part15.rar 1yBWVnZCx8CoPrGIG.part16.rar 1yBWVnZCx8CoPrGIG.part17.rar 1yBWVnZCx8CoPrGIG.part18.rar 1yBWVnZCx8CoPrGIG.part19.rar 1yBWVnZCx8CoPrGIG.part20.rar 1yBWVnZCx8CoPrGIG.part21.rar 1yBWVnZCx8CoPrGIG.part22.rar 1yBWVnZCx8CoPrGIG.part23.rar DaHs0QJnJbt.part1.rar DaHs0QJnJbt.part2.rar DaHs0QJnJbt.part3.rar DaHs0QJnJbt.part4.rar n5oTzoLvG.part1.rar n5oTzoLvG.part2.rar n5oTzoLvG.part3.rar n5oTzoLvG.part4.rar n5oTzoLvG.part5.rar n5oTzoLvG.part6.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part1.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar RSmWMPb0vWr8LIEFtR7o.part1.rar RSmWMPb0vWr8LIEFtR7o.part2.rar RSmWMPb0vWr8LIEFtR7o.part3.rar RSmWMPb0vWr8LIEFtR7o.part4.rar RSmWMPb0vWr8LIEFtR7o.part5.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part01.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part02.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part03.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part04.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part05.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part06.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part07.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part08.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part09.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part10.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar tBJDjsyJtFpY0d3aQ.part1.rar tBJDjsyJtFpY0d3aQ.part2.rar tBJDjsyJtFpY0d3aQ.part3.rar tBJDjsyJtFpY0d3aQ.part4.rar tBJDjsyJtFpY0d3aQ.part5.rar tBJDjsyJtFpY0d3aQ.part6.rar W1Pn8SHf7pbMSf1u99C4f.part1.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part1.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar 

(In the future, it can be possible that there are files with the ending name.part001.rar , name.part002.rar , ... , name.part123.rar )

I am looking for way to become a list with only the last .part'N'.rar.

I want to see:

1yBWVnZCx8CoPrGIG.part23.rar DaHs0QJnJbt.part4.rar n5oTzoLvG.part6.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar RSmWMPb0vWr8LIEFtR7o.part5.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar tBJDjsyJtFpY0d3aQ.part6.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar 

How can I do it?

7 Answers 7

11

With zsh:

$ files=( *.part<->.rar(Nn) ) $ typeset -A last $ for f ($files) last[$f:r:r]=$f $ print -roC1 -- $last 1yBWVnZCx8CoPrGIG.part23.rar DaHs0QJnJbt.part4.rar n5oTzoLvG.part6.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar RSmWMPb0vWr8LIEFtR7o.part5.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar tBJDjsyJtFpY0d3aQ.part6.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar 

Explanations:

# Put a listing of all files that match the "glob" pattern # ANYTHING.partANYNUMBER.rar # into the variable called "files" # without erroring out if nothing matches (N), # and sorting things numerically (n) files=( *.part<->.rar(Nn) ) # make an associative array (-A) called "last" # (in Python, this would be called a dict, # in C++ a std::map<std::string,std::string>) typeset -A last # loop over all entries in "files", and for each strip the last file name suffix # (as separated by a "."), twice; i.e., remove the .rar, and remove the .partANYNUMBER. # Store the file name in last[twiceshortened] for f ($files) last[$f:r:r]=$f # Note that this overwrites the entry for 1yBWVnZCx8CoPrGIG until the highest- # numbered part is reached. # Print the full list: print -roC1 -- $last # -r : no fancy escaping (we don't need that and it makes things strange) # -o : print sorted, in ascending order # -C1 : print as 1 column # -- : the stuff to be printed follows after this # $last : print the content (not the keys!) of the "last" associative array # You get: 1yBWVnZCx8CoPrGIG.part23.rar DaHs0QJnJbt.part4.rar n5oTzoLvG.part6.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar RSmWMPb0vWr8LIEFtR7o.part5.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar tBJDjsyJtFpY0d3aQ.part6.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar 
1
  • that was nice, and refreshed my memory of the <number range> glob! Thus, I added a bit of an explanation. Wonder whether adding (.) to the glob qualifiers would make sense. Commented Aug 17 at 12:48
10

If you trust the ls -1 command (that means, if your file names contain no whitespace, no spaces or newlines etc.) you can use:

$ ls -r1 | sort -t'.' -k1,1 -u 1yBWVnZCx8CoPrGIG.part23.rar DaHs0QJnJbt.part4.rar n5oTzoLvG.part6.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar RSmWMPb0vWr8LIEFtR7o.part5.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar tBJDjsyJtFpY0d3aQ.part6.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar 

Reverse sort your list and use the sort command with . as field separator, only look at the first field and output every field only once.

2
  • 6
    Worth noting that it assumes files names contain no newline characters and that it contains no other . besides the one before part and rar and that the numbers are always 0-padded to the length of the highest part number. Commented Aug 17 at 12:40
  • That would print one of the input file names that begin with 1yBWVnZCx8CoPrGIG, etc., not necessarily the first one - you'd need GNU awk for -s (stable sort) to guarantee output order based on input order. Commented Aug 17 at 18:19
4

Using perl:

$ perl -MFile::Basename -e ' my @files = glob q(*.rar); my %bases; foreach (@files) { my ($b, $p) = split /\.part/, basename $_, q(.rar); $bases{$b} = $p if (!defined $bases{$b} or $p > $bases{$b}); }; foreach my $k (sort keys %bases) { print "$k.part$bases{$k}.rar\n" }' 1yBWVnZCx8CoPrGIG.part23.rar DaHs0QJnJbt.part4.rar RSmWMPb0vWr8LIEFtR7o.part5.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar n5oTzoLvG.part6.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar tBJDjsyJtFpY0d3aQ.part6.rar 

First this gets all the filenames matching the glob *.rar into an array called @files. Then it converts the array into a hash (aka associative array) called %bases where each key is the base filename and the value is the highest part number seen for that basename.

Then it prints out each basename with the part number in the same format as the original filenames: base.partN.rar

This uses the File::Basename module, which is included with perl as part of its standard library.

There are shorter, more obfuscated ways to do this in perl, but this IMO is a nice balance between brevity and readability. It was written to pass the restrictions of use strict (or -Mstrict on the command-line), so it could be re-used as part of a larger script. That's why there are all the my declarations that are usually skipped for one-liners.

3

Using any awk:

$ ls -r1 | awk -F'.' '!seen[$1]++' tBJDjsyJtFpY0d3aQ.part6.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar n5oTzoLvG.part6.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar RSmWMPb0vWr8LIEFtR7o.part5.rar DaHs0QJnJbt.part4.rar 1yBWVnZCx8CoPrGIG.part23.rar 

If you don't want to parse the output of ls then you could alternatively do this with any printf, sort and awk:

printf '%s\n' * | sort -t'.' -r -k1,1 -k2,2 | awk -F'.' '!seen[$1]++' 

That assumes your file names do not contain newlines, that the number of digits at the end of each part is consistent for each substring before part, and there is no . before part in any of the strings.

1
  • The reason you don't parse the output of ls (without -l) is that it's newline delimited, while filenames can be made of any number of lines. With '%s\n', you're printing the file names newline delimited just the same. To print list of file paths in a way that can be processed reliably, you want NUL ('%s\0' or GNU ls --zero) instead of NL as the delimiter (and use sort -z, gawk -v RS='\0' etc assuming GNU implementations) Commented Aug 18 at 18:45
3

Using bash that has the loadable builtin kv (most probably 5.3+ version)

#!/usr/bin/env bash shopt -s nullglob enable kv || exit kv -A assoc -s . -d '' < <( printf '%s\0' *.rar ) for i in "${!assoc[@]}"; do printf '%s.%s\n' "$i" "${assoc["$i"]}" done 

Output

XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar DaHs0QJnJbt.part4.rar n5oTzoLvG.part6.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar RSmWMPb0vWr8LIEFtR7o.part5.rar 1yBWVnZCx8CoPrGIG.part23.rar 

According to help kv

 kv: kv [-A ARRAYNAME] [-s SEPARATORS] [-d RS] Read key-value pairs into an associative array. Read delimiter-terminated records composed of a single key-value pair from the standard input and add the key and corresponding value to the associative array ARRAYNAME. The key and value are separated by a sequence of one or more characters in SEPARATORS. Records are terminated by the first character of RS, similar to the read and mapfile builtins. If SEPARATORS is not supplied, $IFS is used to separate the keys and values. If RS is not supplied, newlines terminate records. If ARRAYNAME is not supplied, "KV" is the default array name. Returns success if at least one key-value pair is stored in ARRAYNAME. 

If sorted output is needed (from the OP's output), then one more loadable builtin named asort from: https://cgit.git.savannah.gnu.org/cgit/bash.git/plain/examples/loadables/asort.c

Something like:

#!/usr/bin/env bash shopt -s nullglob enable kv || exit enable asort || exit kv -A assoc -s . -d '' < <( printf '%s\0' *.rar ) keys=("${!assoc[@]}") asort keys for i in "${keys[@]}"; do printf '%s.%s\n' "$i" "${assoc["$i"]}" done 

Output

1yBWVnZCx8CoPrGIG.part23.rar DaHs0QJnJbt.part4.rar n5oTzoLvG.part6.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar RSmWMPb0vWr8LIEFtR7o.part5.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar tBJDjsyJtFpY0d3aQ.part6.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar 

According to help asort

asort: asort [-nr] array ... or asort [-nr] -i dest source Sort arrays in-place. Options: -n compare according to string numerical value -r reverse the result of comparisons -i sort using indices/keys If -i is supplied, SOURCE is not sorted in-place, but the indices (or keys if associative) of SOURCE, after sorting it by its values, are placed as values in the indexed array DEST Associative arrays may not be sorted in-place. Exit status: Return value is zero unless an error happened (like invalid variable name or readonly array). 

NOTE: As per Stéphane Chazelas the separator must not be part of the filename, in this case a dot/period. Also the script works with the given file names by the OP.

Files like:

(In the future, it can be possible that there are files with the ending name.part001.rar , name.part002.rar , ... , name.part123.rar )

Cannot be parsed properly by the current script.

2
  • 3
    Worth noting that it assumes files names contain no other . besides the one before part and rar and that the numbers are always 0-padded to the length of the highest part number (wouldn't work correctly for instance after touch foo.part{1..12}.rar where it would return part9 instead of part12 Commented Aug 19 at 6:01
  • 1
    ... i.e. you're relying on the order the shell expands that glob pattern into filenames ... which happens alphabetically in fact i.e. 12 is expanded before 9 and that puts them in this order x.part12.rar x.part9.rar ... So 9 gets processed after 12 overwriting it's key in the array and at the end you get x.part9.rar printed where it should have been x.part12.rar instead. Commented Aug 19 at 10:44
1

It can be done in bash:

{ unset files declare -A files for f in *.rar do if [[ "$f" =~ (.*\.part)([[:digit:]]+)\.rar ]] then if [[ -v "files[${BASH_REMATCH[1]}]" ]] then (( 10#${BASH_REMATCH[2]} > 10#${files[${BASH_REMATCH[1]}]} )) && files[${BASH_REMATCH[1]}]="${BASH_REMATCH[2]}" else files[${BASH_REMATCH[1]}]="${BASH_REMATCH[2]}" fi fi done for i in "${!files[@]}" do printf '%s\n' "$i${files[$i]}.rar" done } 

... outputs:

tBJDjsyJtFpY0d3aQ.part6.rar RSmWMPb0vWr8LIEFtR7o.part5.rar DaHs0QJnJbt.part4.rar W1Pn8SHf7pbMSf1u99C4f.part2.rar T7yvBIqHK82qDCNTtz9iuvp2NhQ.part11.rar n5oTzoLvG.part6.rar XYcUpv7b1ZpcczFT5y7Uc9mQTvAf88kl.part2.rar okgbuh8VUxSguDNra9uMTtDlXhiLWQmY.part2.rar 1yBWVnZCx8CoPrGIG.part23.rar 

The idea is to arithmetically compare the part numbers in identical filenames (identical except for the part number) and find which number is the greatest.

3
  • That [[ -v "files[$...]" ]] is an arbitrary command execution vulnerability with bash 5.1 at least where I try this on. Try for instance with x='x$(uname>&2)' bash -c 'typeset -A files; [[ -v "files[$x]" ]]' which for me outputs Linux on stderr. Commented Aug 20 at 3:47
  • @StéphaneChazelas AFAIK that is fixed in bash 5.2 (released, probably, 3 years ago) and I have just confirmed that on version 5.2.37... exists on 5.1 as you say and 5.0 IIRC and I don't know about prior versions ... Still exists on zsh 5.9 though. Commented Aug 20 at 10:09
  • 1
    Related: How to use associative arrays safely inside arithmetic expressions?. On zsh, you'd use (( $+hash[$key] )) Commented Aug 20 at 10:32
1
FOLDER="myarchivefolder"; ( \ cd "$FOLDER"; \ find . -type f -iname '*.rar' | \ cut -d '/' -f 2 | \ LC_ALL=C sort -t "." --key 1,1 --key 2.5n -r | \ sort --merge --unique -t "." --key 1,1 \ ) 

Explanation:

  1. cd into the folder inside a subshell, to avoid unexpected dots in the file paths
  2. Use find to list all files in current folder (add -maxdepth 1 to avoid descending into nested folders)
  3. cut to remove leading ./ from find output
  4. Sort file names
    • -t "." - dot . as field separator
    • --key 1,1 - sort by the first field, the archive name
    • --key 2.5n - and then numerical sort by part number (starting at the 5th character)
    • -r reverse, so the highest part number ends up as the first line
  5. Only output one archive file for each sorted entry
    • sort --merge - skip sorting
    • --unique - subsequent archives are omitted as a duplicates
    • --key 1,1 - only consider archive name for deduplication

The good:

  • avoids globbing: the for loop used in other answers may error with "Argument list too long" on large folders
  • correctly handles 3-digit ".part025" numbers starting with zero
  • It looks to be POSIX-compatible, if you replace long options with short ones.

The bad:

  • Expects that the archive name itself will not contain dots (nor newlines). It would've been more robust to count dots from the end of a file path, but it is not possible with sort's key definition.
  • Many process invocations and piping

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.