When NF is used with FPAT regex, the comma is considered a field. I prefer using NF and FPAT:
1) NF – to limit the output to the actual number of fields for the record
2) FPAT – to handle an embedded comma in a quoted field like line 3:
"Bus Driver, City/Transit",51 3) the awk script is used for several input files that have a different number of columns for record 6 - record 6 is the column name/header for the contents of the file...
The output from testing, the first, test1, uses a fixed value for number of fields, the second, test2, uses NF for the number of fields.
using gawk 4.1.4
BEGIN { FPAT = "(^,)|([^,]+)|(\"[^\"]+\")" OFS = "\t" } NR == 6 { for (i = 1; 6 >= i; ++i) { #for (i = 1; NF >= i; ++i) { colName[i] = $i print "Column Name: " colName[i] } { print "", "number of fields: " NF } } Input File starting at record 6: NR == 6 {...
Occupation,States Licensed Barber,51 "Bus Driver, City/Transit",51 The output I expect/want:
Column Name: Occupation Column Name: States Licensed number of fields: 2 test 1: for (i = 1; 6 >= i; ++i) {...
output is correct - what I expect/want, except, of course, for the 4 columns/fields that are not valid but are shown because of using a fixed value of 6.
Column Name: Occupation Column Name: States Licensed Column Name: Column Name: Column Name: Column Name: number of fields: 2 test 2: for (i = 1; NF >= i; ++i) {...
output is NOT what I expect/want; note the comma is a indicate as a field
Column Name: Occupation Column Name: , Column Name: States Licensed number of fields: 3
FPAT = "\"[^\"]*\"|[^\",]*"(a possibly empty sequence of non-quotes surrounded by quotes, or a possibly empty sequence of not-comma-or-quotes). Or more readablygawk -v FPAT='"[^"]*"|[^",]*' '<stuff>'