5

So I have a file full of test commands that I like to run against some of my functions to make sure they are handling all possible situations correctly. No point in having duplicate commands tho. Here's some examples:

rap ,Xflg MIT X11 rap ,XPBfl 'MITER' rap ,Bflg share git-grep rap ,bfl X11 rap ,Bfl xzfgrep rap ,Bf X11 

... my function 'rap' uses a comma instead of a dash to indicate the start of letter options, then there's some argument following. Since the order of these options doesn't matter:

rap ,Bf X11 rap ,fB X11 

... are exactly the same command. Easy to remove duplicate lines from the file of course, however to avoid the above problem, what I'd like to be able to do is to sort the options alphabetically so that the above would end up:

rap ,Bf X11 rap ,Bf X11 

... and I'd then be able to delete the duplicates. Can something like that be done without heroics? Note this is not sorting 'by' the list of options, but sorting the options themselves.

4
  • Why not use dashes as usual and then employ getopts to parse the options? That would give you the expected behaviour (with regards to parsing command-line options) of your rap function and minimize the amount of extraneous code that you would need to maintain. Commented Apr 21, 2024 at 15:24
  • Can you ever have rap ,B ,f X11 and, if so, should it be considered equivalent to rap ,Bf X11? Commented Apr 22, 2024 at 11:52
  • I'm experimenting with a different paradigm for options processing, thus I'm sorta rethinking 'getopts' entirely. Commented Apr 23, 2024 at 14:10
  • ... for now -- and probably forever, I'm avoiding permitting pointless looseness in option strings. I require my options to be one word cuz there's no reason for options chaos to be permitted. I like freedom that has some point to it. Commented Apr 23, 2024 at 14:13

5 Answers 5

5

Another perl variant:

$ perl -pe 's{^rap ,\K\S+}{join "", sort split //, $&}e' file rap ,Xfgl MIT X11 rap ,BPXfl 'MITER' rap ,Bfgl share git-grep rap ,bfl X11 rap ,Bfl xzfgrep rap ,Bf X11 

For your extra requirement of having lower case letters before upper case ones, you can rely on the fact that in ASCII, 'x' is 'X' ^ 32 (and 'X' is 'x' ^ 32):

$ perl -pe 's{^rap ,\K\S+}{join "", sort {(ord($a)^32) <=> (ord($b)^32)} split //, $&}e' file rap ,fglX MIT X11 rap ,flBPX 'MITER' rap ,fglB share git-grep rap ,bfl X11 rap ,flB xzfgrep rap ,fB X11 
1
  • Marvelous. Gotta learn perl. Commented Apr 20, 2024 at 19:28
4

You could use perl to capture a sequence of word characters following a comma, split the result into an array, sort that and substitute the result:

$ perl -pe 's{(?<=,)(\w+)}{join "", sort split(//, $1)}e' yourfile rap ,Xfgl MIT X11 rap ,BPXfl 'MITER' rap ,Bfgl share git-grep rap ,bfl X11 rap ,Bfl xzfgrep rap ,Bf X11 

As requested here's one (probably suboptimal) way to sort all lowercase letter options before all uppercase ones:

$ perl -pe 's{(?<=,)(\w+)}{@opts = split(//,$1); join "", (sort grep /[[:lower:]]/,@opts), (sort grep /[^[:lower:]]/, @opts) }e' yourfile rap ,fglX MIT X11 rap ,flBPX 'MITER' rap ,fglB share git-grep rap ,bfl X11 rap ,flB xzfgrep rap ,fB X11 
6
  • That's really clever! Works perfectly. Never did learn perl, they say there's nothing it can't do. Commented Apr 20, 2024 at 13:30
  • Hey, not to ask for the moon, but would it be possible to do the above but prefer lower case over upper case in the output? I know that 'B' comes before 'b' in the ASCII ranking, but it just so happens that I'd like it with lower case first. Commented Apr 20, 2024 at 13:42
  • @RayAndrews hmm... it's straightforward to make the sort case-insensitive, but beyond that it gets tricky Commented Apr 20, 2024 at 14:30
  • Never mind! It's pretty damn fine as it is. Commented Apr 20, 2024 at 14:57
  • 1
    As \w also matches on digits and underscores which match neither [[:lower:]] nor [[:upper:]] it may be better to use [[:lower:]] vs [^[:lower:]] or \p{Ll} vs \P{Ll}. Commented Apr 21, 2024 at 10:23
2

Using GNU awk for sorted_in and, since we're using gawk anyway, a few other convenient but unnecessary extensions, we can apply the Decorate-Sort-Undecorate idiom by putting 1 in front of any lower case chars and 2 in front of upper to force the lower case ones to all sort before upper case and then remove those decorations again before printing:

$ cat tst.awk BEGIN { PROCINFO["sorted_in"] = "@val_str_asc" } match( $0, /^(\s*\S+\s*,)(\S+)(.*)/, a ) { gsub( /[[:lower:]]/, "1 &,", a[2] ) # Decorate gsub( /[[:upper:]]/, "2 &,", a[2] ) sorted = "" split(a[2],opts,",") for ( idx in opts ) { # Sort sorted = sorted opts[idx] } gsub( /[[:digit:] ,]/, "", sorted ) # Undecorate $0 = a[1] sorted a[3] } { print } 

$ awk -f tst.awk file rap ,fglX MIT X11 rap ,flBPX 'MITER' rap ,fglB share git-grep rap ,bfl X11 rap ,flB xzfgrep rap ,fB X11 
1

Using Raku (formerly known as Perl_6)

Alphabetize letter options in ASCII-order (uppercase first):

~$ raku -pe 's{ <?after \, > (\w+) } = "{$0.comb.sort.join}";' file 

Raku is a programming language in the Perl-family that features high-level support for Unicode built-in. Above is a Raku translation of excellent Perl code by @steeldriver (with a nod to @StéphaneChazelas).

  • The s/// form can be written s{ … } = " … ". Curlies in the replacement half (inside the doublequotes) signify a code block that runs on the capture.
  • Captures in Raku start from $0.
  • A positive lookbehind "Y-after-X" in Raku is written <?after X > Y (the negative uses a ! instead of a ?).
  • Raku implements a positive (global) selector function comb, that--absent a regex argument--breaks on every character.

Sample Input:

rap ,Xflg MIT X11 rap ,XPBfl 'MITER' rap ,Bflg share git-grep rap ,bfl X11 rap ,Bfl xzfgrep rap ,Bf X11 

Sample Output (1):

rap ,Xfgl MIT X11 rap ,BPXfl 'MITER' rap ,Bfgl share git-grep rap ,bfl X11 rap ,Bfl xzfgrep rap ,Bf X11 

Alphabetize letter options with lowercase first:

~$ raku -pe 's{ <?after \, > (\w+) } = "{ my (@l,@u); $0.comb.map({ /<:Ll>/ ?? @l.push($_) !! @u.push($_) }); join "", @l.sort,@u.sort }";' file 

OR:

~$ raku -pe 's{ <?after \, > (\w+) } = "{ $0.comb.classify: { /<:Ll>/ ?? "lc" !! "uc" }, :into( my %case{Str} ); $_.join given %case.sort.map: *.value.sort.join }";' file 

Sample Output (2):

rap ,fglX MIT X11 rap ,flBPX 'MITER' rap ,fglB share git-grep rap ,bfl X11 rap ,flB xzfgrep rap ,fB X11 

Above uses the Unicode representation of lowercase letter, <:Ll> (see first link below for details). Also on display is Raku's ternary operator: Test ?? True !! False (see second link below for details). The two answers differ in that the first uses two arrays (directly) with a ternary, while the second uses a single hash (created via classify).


NOTE: Thanks to a comment by @StéphaneChazelas, it's probably better to use (<alpha>+) as the capture, instead of (\w+).

https://docs.raku.org/language/regexes#Predefined_character_classes
https://docs.raku.org/language/operators#infix_??_!!
https://docs.raku.org/routine/classify
https://docs.raku.org/language/regexes
https://raku.org

1
  • Thanks jubilatious, but not wandering too far from home, I'll set my sights on getting to know perl. Commented Apr 20, 2024 at 19:29
0

If we replace the comma in the input file with a dash, we can use getopts as usual to parse the rap function's options.

That change can be done with sed, and assuming we only ever need to change rap , at the start of any line to rap -, it would look like this:

sed 's/^rap ,/rap -/' file.in >file 

We would then be able to simply source the generated file in our script with . ./file assuming the rap function had been previously declared.

To parse the options in the rap function:

rap () { OPTIND=1 unset -v B_flag P_flag X_flag unset -v b_flag f_flag g_flag l_flag while getopts BPXbfgl opt; do case $opt in B) B_flag=true ;; P) P_flag=true ;; X) X_flag=true ;; b) b_flag=true ;; f) f_flag=true ;; g) g_flag=true ;; l) l_flag=true ;; *) echo 'Error' >&2; return 1 esac done shift "$(( OPTIND - 1 ))" # Act on set flags here. if "${f_flag-false}"; then echo 'The -f option was used' fi # The non-options are available in "$@". printf 'Other argument: %s\n' "$@" printf -- '---\n' } 

Note that by setting the flag variables in the while loop and acting on them after the loop, we avoid acting on duplicated options multiple times.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.