How may I generalize an awk command into a script? (extracting/rearranging columns from file)

Question

I'm trying to generalize:

$ awk -F":" '{ print $7 ":" $1 }' /etc/passwd

into a script, with delimiter, input file and selection of columns provided from command line arguments, something like:

#! /bin/bash # parse command line arguments into variables `delimiter`, `cols` and `inputfile` ... awk -F"$delimiter" '{ print '"$cols"' }' "$inputfile"

Input is from a file, so that STDIN input can also apply. I would prefer specifying the columns as separate arguments in an order. The output delimiters are the same as the input delimiters, as in the example command.

How would you write such a script?

Related: Environment variable not expanded inside the command line argument — steeldriver
– steeldriver, Commented Jul 22, 2018 at 17:48
@Tim How is this different from cut? How would you want the command line to look? Whatever it looks like, it going to be a wrapper around cut, not awk. — Kusalananda
– Kusalananda ♦, Commented Jul 22, 2018 at 21:24
@Tim Wrapping a general awk command can not be done. Wrapping a specific awk command is easy. In this case though, the specific awk command degenerates to the cut utility, and the only thing that needs to be done by the wrapper is to sort out the command line arguments. If these are on the same form as with cut, then no wrapper is needed. — Kusalananda
– Kusalananda ♦, Commented Jul 22, 2018 at 22:13
@Tim Well, that's a good point that wasn't mentioned in the question. You should add that there. — Kusalananda
– Kusalananda ♦, Commented Jul 22, 2018 at 22:39

Lucas · Accepted Answer · 2018-07-22 21:37:30Z

2

You can use bash's getopts (you have to scroll down a little bit) to do some command line parsing:

#!/bin/bash delimiter=: first=1 second=2 while getopts d:f:s: FLAG; do case $FLAG in d) delimiter=$OPTARG;; f) first=$OPTARG;; s) second=$OPTARG;; *) echo error >&2; exit 2;; esac done shift $((OPTIND-1)) awk -F"$delimiter" -v "OFS=$delimiter" -v first="$first" -v second="$second" '{ print $first OFS $second }' "$@"

edited Jul 22, 2018 at 21:37

answered Jul 22, 2018 at 19:20

Lucas

2,9551 gold badge18 silver badges26 bronze badges

Thanks. There can be arbitrary number of fields being selected.

Tim
– Tim

2018-07-22 19:38:47 +00:00
Commented Jul 22, 2018 at 19:38
1

You ought to pass first and second using -v too.

Kusalananda
– Kusalananda ♦

2018-07-22 19:42:38 +00:00
Commented Jul 22, 2018 at 19:42
If you want to select arbitrary fields you are very close to putting the whole awk script into a bash variable. And then it has to be given from the command line to your bash script and then you could just as well type out the literal awk command. In this sense awk itself would be the maximal generalisation of the bash script you are looking for.

Lucas
– Lucas

2018-07-22 20:29:03 +00:00
Commented Jul 22, 2018 at 20:29
2

awk -v i=7 '{ print $i }'

Kusalananda
– Kusalananda ♦

2018-07-22 21:20:34 +00:00
Commented Jul 22, 2018 at 21:20
1

@Kusalananda cool I didn't know that.

Lucas
– Lucas

2018-07-22 21:35:21 +00:00
Commented Jul 22, 2018 at 21:35

| Show 1 more comment

Kusalananda · Accepted Answer · 2018-07-23 12:59:57Z

The following shell script takes an optional -d option to set the delimiter (tab is default), as well as a non-optional -c option with a column specification.

The column specification is similar to that of cut but also allows for rearranging and duplicating the output columns, as well as specifying ranges backwards. Open ranges are also supported.

The file to parse is given on the command line as the last operand, or passed on standard input.

#!/bin/sh delim='\t' # tab is default delimiter # parse command line option while getopts 'd:c:' opt; do case $opt in d) delim=$OPTARG ;; c) cols=$OPTARG ;; *) echo 'Error in command line parsing' >&2 exit 1 esac done shift "$(( OPTIND - 1 ))" if [ -z "$cols" ]; then echo 'Missing column specification (the -c option)' >&2 exit 1 fi # ${1:--} will expand to the filename or to "-" if $1 is empty or unset cat "${1:--}" | awk -F "$delim" -v cols="$cols" ' BEGIN { # output delim will be same as input delim OFS = FS # get array of column specs ncolspec = split(cols, colspec, ",") } { # get fields of current line # (need this as we are rewriting $0 below) split($0, fields, FS) nf = NF # save NF in case we have an open-ended range $0 = ""; # empty $0 # go through given column specification and # create a record from it for (i = 1; i <= ncolspec; ++i) if (split(colspec[i], r, "-") == 1) # single column spec $(NF+1) = fields[colspec[i]] else { # column range spec if (r[1] == "") r[1] = 1 # open start range if (r[2] == "") r[2] = nf # open end range if (r[1] < r[2]) # forward range for (j = r[1]; j <= r[2]; ++j) $(NF + 1) = fields[j] else # backward range for (j = r[1]; j >= r[2]; --j) $(NF + 1) = fields[j] } print }'

There's a slight inefficiency in this as the code needs to re-parse the column specification for each new line. If support for open-ended ranges is not needed, or if all lines are assumed to have exactly the same number of columns, only a single pass over the specification can be done in the BEGIN block (or in a separat NR==1 block) to create an array of fields that should be outputted.

Missing: Sanity check for column specification. A malformed specification string may well cause weirdness.

Testing:

$ cat file 1:2:3 a:b:c @:(:)

$ sh script.sh -d : -c 1,3 <file 1:3 a:c @:)

$ sh script.sh -d : -c 3,1 <file 3:1 c:a ):@

$ sh script.sh -d : -c 3-1,1,1-3 <file 3:2:1:1:1:2:3 c:b:a:a:a:b:c ):(:@:@:@:(:)

$ sh script.sh -d : -c 1-,3 <file 1:2:3:3 a:b:c:c @:(:):)

Thanks. I wrote a script, and could you give some constructive advice if I posted it, and comparing it to your script? — Tim
– Tim, Commented Jul 23, 2018 at 13:09
@Tim Use site's chat, post a link to it there, I'll look at it when I'm back at a computer. — Kusalananda
– Kusalananda ♦, Commented Jul 23, 2018 at 13:16

Tim · Accepted Answer · 2018-07-23 15:35:19Z

Thanks for replies. Here is my script. I created it by trial and error which doesn't often lead to a working solution, and don't have a systematic way of coming up with a script which I always aim at. Please provide some code review if you can. Thanks.

The script works in the following examples (not sure if works in general):

$ projection -d ":" /etc/passwd 4 3 6 7 $ projection -d "/" /etc/passwd 4 3 6 7

Script projection is:

#! /bin/bash # default arg value delim="," # CSV by default # Parse flagged arguments: while getopts "td:" flag do case $flag in d) delim=$OPTARG;; t) delim="\t";; ?) exit;; esac done # Delete the flagged arguments: shift $(($OPTIND -1)) inputfile="$1" shift 1 fs=("$@") # prepend "$" to each field number fields=() for f in "${fs[@]}"; do fields+=(\$"$f") done awk -F"$delim" "{ print $(join_by.sh " \"$delim\" " "${fields[@]}") }" "$inputfile"

where join_by.sh is

#! /bin/bash # https://stackoverflow.com/questions/1527049/join-elements-of-an-array # https://stackoverflow.com/a/2317171/ # get the separator: d="$1"; shift; # interpolate other parameters by teh separator # by treating the first parameter specially echo -n "$1"; shift; printf "%s" "${@/#/$d}";

Your shell script is the same as (IFS="$delim"; echo "${fields[*]}"). — Kusalananda
– Kusalananda ♦, Commented Jul 23, 2018 at 16:21
I dislike the fact that you inject shell code into the awk script. It would be safer to pass a list of field numbers as a string, and then let awk do a tiny bit of looping. — Kusalananda
– Kusalananda ♦, Commented Jul 23, 2018 at 16:23
@Kusalananda (IFS="$delim"; echo "${fields[*]}") works only when the delimiter is a single character, not when it is a string. Or am I wrong? — Tim
– Tim, Commented Jul 25, 2018 at 17:42
No, that's correct, only the first character of IFS will be used. — Kusalananda
– Kusalananda ♦, Commented Jul 25, 2018 at 17:49

Stack Exchange Network

How may I generalize an awk command into a script? (extracting/rearranging columns from file)

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

How may I generalize an awk command into a script? (extracting/rearranging columns from file)

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions