Revisions to Flexible Pattern Matching

added 146 characters in body

edited Apr 11, 2022 at 9:04

82
7

Here's a solution that doesn't require fudging with SUBSEP, looping through the fields, having the files being pre-sorted, or have a pre-set number of columns/fields :

 mawk -v \_=testfile_001.txt -F/ ' BEGIN { while(getline<_) { __[$!(NF=NF)] } _=close_*=close(_)*(FS="^$") } _^($_ in __)' testfile_002.txt 0 14 0 15 0 20 7200 14 7200 15

just realized setting FS="^$" for 2nd file is much faster since we're doing line-wide matching, so splitting fields is a waste of time.

Tested and proven working on gawk 5.1.1 (including flags -c/-P), mawk 1.3.4, mawk 1.9.9.6, and macos nawk

-- The 4Chan Teller

Here's a solution that doesn't require fudging with SUBSEP, looping through the fields, having the files being pre-sorted, or have a pre-set number of columns/fields :

 mawk -v \_=testfile_001.txt -F/ ' BEGIN { while(getline<_) { __[$!(NF=NF)] } _=close(_)*(FS="^$") } _^($_ in __)' testfile_002.txt 0 14 0 15 0 20 7200 14 7200 15

just realized setting FS="^$" for 2nd file is much faster since we're doing line-wide matching, so splitting fields is a waste of time.

Here's a solution that doesn't require fudging with SUBSEP, looping through the fields, having the files being pre-sorted, or have a pre-set number of columns/fields :

 mawk -v \_=testfile_001.txt -F/ ' BEGIN { while(getline<_) { __[$!(NF=NF)] } _*=close(_)*(FS="^$") } _^($_ in __)' testfile_002.txt 0 14 0 15 0 20 7200 14 7200 15

just realized setting FS="^$" for 2nd file is much faster since we're doing line-wide matching, so splitting fields is a waste of time.

Tested and proven working on gawk 5.1.1 (including flags -c/-P), mawk 1.3.4, mawk 1.9.9.6, and macos nawk

-- The 4Chan Teller

added 69 characters in body

Source Link

edited Apr 11, 2022 at 8:57

RARE Kpop Manifesto

82
7

Here's a solution that doesn't require fudging with SUBSEP, orlooping through the fields, having the files being pre-sorted, and automatically handles unlimited columns insteador have a pre-set number of just first 2columns/fields :

 mawk -v \_=testfile_001.txt -F/ ' BEGIN { while(getline<_) { __[$!(NF=NF)] } _=close(_)*(FS="^$") } _^($_ in __)<NF'' testfile_002.txt 0 14 0 15 0 20 7200 14 7200 15

just realized setting FS="^$" for 2nd file is much faster since we're doing line-wide matching, so splitting fields is a waste of time.

Here's a solution that doesn't require fudging with SUBSEP, or having the files being pre-sorted, and automatically handles unlimited columns instead of just first 2 :

 mawk -v \_=testfile_001.txt -F/ ' BEGIN { while(getline<_) { __[$!(NF=NF)] } _=close(_)*(FS="^$") } ($_ in __)<NF' testfile_002.txt 0 14 0 15 0 20 7200 14 7200 15

just realized setting FS="^$" for 2nd file is much faster since we're doing line-wide matching, so splitting fields is a waste of time.

Here's a solution that doesn't require fudging with SUBSEP, looping through the fields, having the files being pre-sorted, or have a pre-set number of columns/fields :

 mawk -v \_=testfile_001.txt -F/ ' BEGIN { while(getline<_) { __[$!(NF=NF)] } _=close(_)*(FS="^$") } _^($_ in __)' testfile_002.txt 0 14 0 15 0 20 7200 14 7200 15

just realized setting FS="^$" for 2nd file is much faster since we're doing line-wide matching, so splitting fields is a waste of time.

added 69 characters in body

Source Link

edited Apr 11, 2022 at 8:52

RARE Kpop Manifesto

82
7

Here's a solution that doesn't require fudging with SUBSEP, or having the files being pre-sorted, and automatically handles unlimited columns instead of just first 2 :

 mawk -v \_=testfile_001.txt -F/ ' BEGIN { while(getline<_) { __[$!(NF=NF)]  } _=close(_)*(FS=OFSFS="^$") } !($_ in __)'<NF' testfile_002.txt 0 14 0 15 0 20 7200 14 7200 15

just realized setting FS="^$" for 2nd file is much faster since we're doing line-wide matching, so splitting fields is a waste of time.

Here's a solution that doesn't require fudging with SUBSEP, or having the files being pre-sorted :

 mawk -v \_=testfile_001.txt -F/ ' BEGIN { while(getline<_) { __[$!(NF=NF)] } _=close(_)*(FS=OFS) } !($_ in __)' testfile_002.txt 0 14 0 15 0 20 7200 14 7200 15

Here's a solution that doesn't require fudging with SUBSEP, or having the files being pre-sorted, and automatically handles unlimited columns instead of just first 2 :

 mawk -v \_=testfile_001.txt -F/ ' BEGIN { while(getline<_) { __[$!(NF=NF)]  } _=close(_)*(FS="^$") } ($_ in __)<NF' testfile_002.txt 0 14 0 15 0 20 7200 14 7200 15

just realized setting FS="^$" for 2nd file is much faster since we're doing line-wide matching, so splitting fields is a waste of time.

Source Link

answered Apr 11, 2022 at 8:47

RARE Kpop Manifesto

82
7

Loading

Stack Exchange Network

Return to Answer