how to get awk to provide the next column after match

Question

I have the following file (somefile.txt):

/A/1/B/1/C/1/D/1/E/1/F/2/G/1/H/1/I/1/J/1/K/1/ /B/1/C/1/D/1/E/1/F/5/G/1/H/1/I/1/J/1/K/1/ /C/1/D/1/E/1/F/9/G/1/H/1/I/1/J/1/K/1/ /D/1/E/1/F/7/G/1/H/1/I/1/J/1/K/1/ /A/1/B/1/C/1/D/1/E/1/F/8/G/1/H/1/I/1/J/1/K/1/ /A/1/B/1/C/1/D/1/E/1/F/3/G/1/H/1/I/1/J/1/K/1/ /A/1/B/1/C/1/D/1/E/1/F/6/G/1/H/1/I/1/J/1/K/1/ /B/1/C/1/D/1/E/1/F/8/G/1/H/1/I/1/J/1/K/1/ /D/1/E/1/F/3/G/1/H/1/I/1/J/1/K/1/ /C/1/D/1/E/1/F/6/G/1/H/1/I/1/J/1/K/1

I am looking to get the following result (the next number after F):

2 5 9 7 8 3 6 8 3 6

Given the number of columns per line is variable, is there a way I can do something like the following?:

awk -F'/' '/F/ {print <column_of_match> + 1 }' somefile.txt

You can use Range of Fields in AWK. For example awk -F'/' '/F/ { for (i=1;i<=NF;i++) if ($i ~ "F") print $(i+1) }' infile where infile is your input file — Valentin Bajrami
– Valentin Bajrami, Commented Dec 9, 2022 at 20:38
Keep is simple: sed -nr -e 's^.*F([0-9]+).*^\1^p' I love sed, but you may not need it (if that is all you are doing. Hard to tell. ) — ctrl-alt-delor
– ctrl-alt-delor, Commented Dec 10, 2022 at 9:40

cas · Accepted Answer · 2022-12-10 08:39:52Z

With perl, because array slices are convenient and so is the ability to treat each pair of elements in an array as the key & value of a hash:

$ perl -F/ -lane '%f = @F[1..$#F]; print $f{F}' input.txt 2 5 9 7 8 3 6 8 3 6

Perl's -F and -a (autosplit) work similarly to awk - but instead of auto-splitting the line into $1, $2, $3, etc, it auto-splits each line into an array called @F.

The script converts a slice of array @F (all but the zeroth element) into a hash (associative array) called %f, and prints the element of %f with key 'F'.

To highlight what this does/how it works (and why we needed to exclude the empty string zeroth element of @F), here's what @F and %f look like when using the Data::Dump module's dump function:

$ perl -F/ -MData::Dump=dump -lane ' %f = @F[1..$#F]; print join("\n", $_, dump(@F), dump(\%f), $f{F}), "\n"' input.txt /A/1/B/1/C/1/D/1/E/1/F/2/G/1/H/1/I/1/J/1/K/1/ ("", "A", 1, "B", 1, "C", 1, "D", 1, "E", 1, "F", 2, "G", 1, "H", 1, "I", 1, "J", 1, "K", 1) { A => 1, B => 1, C => 1, D => 1, E => 1, F => 2, G => 1, H => 1, I => 1, J => 1, K => 1 } 2 /B/1/C/1/D/1/E/1/F/5/G/1/H/1/I/1/J/1/K/1/ ("", "B", 1, "C", 1, "D", 1, "E", 1, "F", 5, "G", 1, "H", 1, "I", 1, "J", 1, "K", 1) { B => 1, C => 1, D => 1, E => 1, F => 5, G => 1, H => 1, I => 1, J => 1, K => 1 } 5 /C/1/D/1/E/1/F/9/G/1/H/1/I/1/J/1/K/1/ ("", "C", 1, "D", 1, "E", 1, "F", 9, "G", 1, "H", 1, "I", 1, "J", 1, "K", 1) { C => 1, D => 1, E => 1, F => 9, G => 1, H => 1, I => 1, J => 1, K => 1 } 9 ...and so on...

Note: this will print a blank line if there is no F in the input. If that's not what you want, do something like:

perl -F/ -lane '%f = @F[1..$#F]; if (defined $f{F}) { print $f{F} } else { print STDERR "Error on input line $.: F has absconded" }' input.txt

Stewart · Accepted Answer · 2022-12-10 12:14:39Z

Here's an answer using sed:

$ sed -n 's|.*F/\([0-9]\).*|\1|p' <<EOF /A/1/B/1/C/1/D/1/E/1/F/2/G/1/H/1/I/1/J/1/K/1/ /B/1/C/1/D/1/E/1/F/5/G/1/H/1/I/1/J/1/K/1/ /C/1/D/1/E/1/F/9/G/1/H/1/I/1/J/1/K/1/ /D/1/E/1/F/7/G/1/H/1/I/1/J/1/K/1/ /A/1/B/1/C/1/D/1/E/1/F/8/G/1/H/1/I/1/J/1/K/1/ /A/1/B/1/C/1/D/1/E/1/F/3/G/1/H/1/I/1/J/1/K/1/ /A/1/B/1/C/1/D/1/E/1/F/6/G/1/H/1/I/1/J/1/K/1/ /B/1/C/1/D/1/E/1/F/8/G/1/H/1/I/1/J/1/K/1/ /D/1/E/1/F/3/G/1/H/1/I/1/J/1/K/1/ /C/1/D/1/E/1/F/6/G/1/H/1/I/1/J/1/K/1 EOF 2 5 9 7 8 3 6 8 3 6

Explanation of -n 's|.*F/$[0-9]$.*|\1|p':

-n means don't print anything unless explicitly told to
The trailing p in the expression says: "if this expression was matched, print this line". That means lines without an F/[0-9] will not be printed.
s|foo|bar| in the expression means: substitute foo with bar. You usually see it as s/foo/bar/, but since we have a / in the expression, I used | to avoid escaping it.
In the match part (foo):
- .*F/[0-9].* means: all lines with an F/ then a digit.
- .*F/$[0-9]$.* means: match a whole line containing an F/ then a digit, but remember that digit
In the substitute part (bar):
- \1 refers to that digit we remembered.

In short:

Find any lines matching *F/[0-9]*, and replace it with only the digit.

If multi-digit positive integers are possible, then the expression can be easily adapted:

sed -n 's|.*/F/\([0-9]\+\)/.*|\1|p'

Paul_Pedant · Accepted Answer · 2022-12-10 00:02:44Z

Just use a pattern that matches the separators and F, split that substring into an array, and print that subfield.

Tested code:

$ awk 'match ($0, "/F/[^/]/") { split (substr ($0, RSTART, RLENGTH), V, "/"); print V[3]; }' Match.txt

No need to iterate over fields, or to use two processes.

You could also just cut out the part you need without the split, by adjusting the string indexing, but that makes it less general and more likely to pick up a one-off error.

awk 'match ($0, "/F/[^/]/") { print substr ($0, RSTART+3, RLENGTH-4); }' Match.txt

Ed Morton · Accepted Answer · 2022-12-10 03:41:01Z

Using GNU awk for multi-char RS:

$ awk -v RS='[/\n]+' 'f{print; f=0} /F/{f=1}' file 2 5 9 7 8 3 6 8 3 6

Baba · Accepted Answer · 2022-12-11 11:02:43Z

This is a possible solution for your problem, it involves using awk twice, once for splitting at the right place, the next to grab the number and print it.

Here is the script:

awk -F "/F/" '{print $2}' prova.txt | awk -F "/" '{print $1}'

In the first part we split the input string on /F/, so that the first letter of the second part is the number we are looking for, and in the second part of the script we just isolate this number.

This works when we have a maximum of one F per line (it even works when there is no F, as it will just print empty line.

jubilatious1 · Accepted Answer · 2024-02-21 10:31:34Z

Using Raku (formerly known as Perl_6)

Answer for first target value (e.g. "F") per line:

~$ raku -ne 'my @a = .split("/", :skip-empty); my $idx = @a.grep(/F/, :k); put @a[$idx.first + 1] ;' file #OR ~$ raku -ne 'my @a = .comb(/ <-[/]>+ /); my $idx = @a.grep(/F/, :k); put @a[$idx.first + 1];' file

Answer for multiple target values (e.g. "F") per line:

~$ raku -ne 'my @a = .split("/", :skip-empty); my @idx = @a.grep(/F/, :k); put @a[ @idx.map: * + 1 ] ;' file #OR ~$ raku -ne 'my @a = .comb(/ <-[/]>+ /); my @idx = @a.grep(/F/, :k); put @a[ @idx.map: * + 1 ] ;' file

Briefly for all four (4) answers, Raku is called at the command line with -ne non-autoprinting linewise flags:

First answer of each pair: lines are read in and .split on / SOLIDUS, :skip-empty (skipping empty) elements created.
Second answer of each pair: Alternatively, .comb is used, which culls out elements matching the pattern <-[/]>+, i.e. runs of one-or-more characters consisting of any character except / SOLIDUS.
Either way, resultant elements are stored in @a array.

The top two answers return the answer for the first target value (e.g. "F") per line. They take @a array and grep for the desired match (F), returning :k (the index position of matches found) and storing these values in @idx. When @idx is used index an array, first or similar function must be called, which is then incremented by + 1.

The bottom two answers return the answer for all target values (e.g. "F") per line. They take @a array and grep for the desired match (F), returning :k (the index position of matches found) and storing these values in @idx. Here all values of @idx are incremented by + 1 using a map construct: @idx.map: * + 1. Alternatively, @idx.map({ $_ + 1 }) can be used instead.

Sample Input:

/A/1/B/1/C/1/D/1/E/1/F/2/G/1/H/1/I/1/J/1/K/1/ /B/1/C/1/D/1/E/1/F/5/G/1/H/1/I/1/J/1/K/1/ /C/1/D/1/E/1/F/9/G/1/H/1/I/1/J/1/K/1/ /D/1/E/1/F/7/G/1/H/1/I/1/J/1/K/1/ /A/1/B/1/C/1/D/1/E/1/F/8/G/1/H/1/I/1/J/1/K/1/ /A/1/B/1/C/1/D/1/E/1/F/3/G/1/H/1/I/1/J/1/K/1/ /A/1/B/1/C/1/D/1/E/1/F/6/G/1/H/1/I/1/J/1/K/1/ /B/1/C/1/D/1/E/1/F/8/G/1/H/1/I/1/J/1/K/1/ /D/1/E/1/F/3/G/1/H/1/I/1/J/1/K/1/ /C/1/D/1/E/1/F/6/G/1/H/1/I/1/J/1/K/1

Sample Output (all four answers):

2 5 9 7 8 3 6 8 3 6

Again, the bottom two answers will correctly handle lines with multiple target values per line. So if an extra line is added to the bottom of the Sample Input like:

/C/1/D/1/E/1/F/6/G/1/H/1/I/1/J/1/K/1/F/13

...the bottom two answers correctly return 6 13, which are the numbers following "F" on that line.

https://docs.raku.org/language/list#Arrays
https://docs.raku.org/routine/comb
https://raku.org

canupseq · Accepted Answer · 2024-02-21 11:48:59Z

You can use grep and cut commands:

$ grep -o 'F\/[0-9]' input_file | cut -c3 2 5 9 7 8 3 6 8 3 6

Stack Exchange Network

how to get awk to provide the next column after match

7 Answers 7

You must log in to answer this question.

Linked

Hot Network Questions

how to get awk to provide the next column after match

7 Answers 7

You must log in to answer this question.

Linked

Related

Hot Network Questions