9

I have a file in the following format:

field1|field2|field3 field1|"field2|field2"|field3 

Notice the second row contains double quotes. The string within the double quotes belongs to field 2. How do extract this using awk? I've been googling with no results. I tried this with no luck as well

FS='"| "|^"|"$' '{print $2}' 
2

3 Answers 3

13

If you have a recent version of gawk you're in luck. There's the FPAT feature, documented here

awk 'BEGIN { FPAT = "([^|]+)|(\"[^\"]+\")" } { print "NF = ", NF for (i = 1; i <= NF; i++) { sub(/"$/, "", $i); sub(/^"/, "", $i);printf("$%d = %s\n", i, $i) } }' file NF = 3 $1 = field1 $2 = field2 $3 = field3 NF = 3 $1 = field1 $2 = field2|field2 $3 = field3 
4
  • You can replace + with * FPAT = "([^|]*)|(\"[^\"]+\")" to handle empty fields, such as || Commented Aug 14, 2018 at 19:12
  • Brilliant. However, where I'm using this on comma separated files it doesn't cope with double quotes in the field, so I'm using FPAT = "([^,]*)|(\"([^\"]|\"\")*\")". For the above with pipe delimiter it would be FPAT = "([^|]*)|(\"([^\"]|\"\")*\")". Commented Jan 10, 2020 at 14:38
  • So, what if I don't have FPAT available? Commented Jan 23, 2020 at 0:23
  • @musicin3d, in that case take a look at Sobrique's perl solution Commented Jan 23, 2020 at 1:44
1

This is something that you get in csv - if the delimiter is part of the field, it gets quoted. That suddenly makes the task of parsing it MUCH harder, because you can't just split on a delim.

Fortunately, if perl is an option, you have the Text::CSV module that handles this case:

#!/usr/bin/env perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV -> new ( { 'sep_char' => '|' } ); while ( my $row = $csv -> getline ( *STDIN ) ) { print $row -> [1],"\n"; } 

Could probably condense this to an inline/pipeable if you prefer - something like:

perl -MText::CSV -e 'print map { $_ -> [1] ."\n" } @{ Text::CSV -> new ( { 'sep_char' => '|' } ) -> getline_all ( *ARGV )}; 
-3

You may want to format this data with sed so it can be parsed by awk more easily. for example:

$ sed 's/"//g' awktest1.txt field1|field2|field3 field1|field2|field2|field3 $ sed 's/"//g' awktest1.txt > awktest2.txt $ awk 'BEGIN {FS = "|"} ; {print $2}' awktest2.txt field2 field2 

But then again, I don't know the nature of the data you are working with.

1
  • 3
    The idea is explicitly to have field2|field2 as a single field in the second line. Commented Oct 23, 2015 at 15:58

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.