AWK selection of the specified columns

Question

I have a file fith big number of colums like

ASN 1 | R ASN 1 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.045 +/- 0.034 | -0.045 +/- 0.034 | 0.000 +/- 0.000 | 0.000 +/- 0.001 HID 2 | R HID 2 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.001 +/- 0.002 | -0.001 +/- 0.002 | 0.000 +/- 0.000 | 0.000 +/- 0.001 PRO 3 | R PRO 3 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.001 +/- 0.004 | -0.001 +/- 0.004 | 0.000 +/- 0.000 | -0.000 +/- 0.001 LYS 4 | R LYS 4 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.182 +/- 0.073 | -0.176 +/- 0.072 | 0.000 +/- 0.000 | 0.005 +/- 0.003 MET 5 | R MET 5 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.004 +/- 0.004 | 0.006 +/- 0.004 | 0.000 +/- 0.000 | 0.002 +/- 0.001

from this file I need to extract of only first and last column removing from the last column error value (+/- value ) to obtain smth like: ASN 1 0.000

its strange that below command works good with the exemption that it could not remove error from the last column

gawk -F'[|]' '{print $1, $NF}' $file ASN 1 0.000 +/- 0.001 HID 2 -0.000 +/- 0.001 PRO 3 -0.000 +/- 0.001 LYS 4 0.000 +/- 0.001 MET 5 -0.000 +/- 0.001 GLU 6 -0.000 +/- 0.001 MET 7 0.000 +/- 0.001 ILE 8 0.000 +/- 0.001 LEU 9 0.001 +/- 0.001

alternatively when I replace it with

gawk -F'[|,+/-]' '{print $1, $(NF-1)}' $file

it didn't replace column before last column (value) but did subtraction -1 from the last (error) column:

ASN 1 -0.999 HID 2 -0.999 PRO 3 -0.999 LYS 4 -0.997

what should I correct here to fix the script ?

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

Your regex for field separator is wrong. Use like this:

gawk -F'\\||\\+/-' 'NF>1{print $1, $(NF-1)}' file ASN 1 0.000 HID 2 0.000 PRO 3 -0.000 LYS 4 0.005 MET 5 0.002

i.e. use double escaping for regex meta characters like | or +.

Code Demo

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Feb 6, 2015 at 17:19

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user3470313 Over a year ago

produced error OMP_OR2W1_linalol.dat FNR=5) fatal: attempt to access field -1

anubhava Over a year ago

With your sample data, what I pasted here is the actual output I got from gawk 4.1.1 btw using my BSD awk also this works

user3470313 Over a year ago

its real works only when I use $NF-1 but in that case it just substract ONE from the LAST VALUE :)

Ed Morton Over a year ago

@user3470313 clearly you have empty lines in your file. Just prefix the action block with NF so it only executes on lines that aren't empty.

anubhava Over a year ago

As suggested by @EdMorton, I added a safeguard condition NF> in the answer. Try again. btw did you check demo link?

|

David W. · Accepted Answer · 2015-02-06 17:59:44Z

When you use -F'[|]', you are stating that | is a field separator. Using -F[|+/-] means you're using any of these characters as a field separator: |, +, /, or -.

You have two choices:

Use spaces, but then understand that you need to calculate your columns a bit differently since +/- is now a column. I print columns 1, 2, and the third from the last.

For example:

$ awk '{printf ("%-5.5s %2d %10.3f\n", $1, $2, $(NF - 2))}' test.txt ASN 1 0.001 HID 2 0.001 PRO 3 0.001 LYS 4 0.003 MET 5 0.001

Or, you can use a fancier regular expression that says you want to separate fields via *\| * or *+/- *. Note I include the spaces in my regular expression field separator. This way, spaces are stripped from my columns:

Note my regular expression:

$ awk -F' *\| *| *\+/- *' \ '{printf ("%-5.5s %2d %10.3f\n", $1, $2, $NF)}' file ASN 1 0.001 HID 2 0.001 PRO 3 0.001 LYS 4 0.003 MET 5 0.001

This works with standard awk on BSD and nawk on Solaris. gawk might do things a bit differently.

its again produce error if I use $(NF - 1) and alternatively subtract 1 from the value of last columm if I use $NF - 1. It seems problem with my AWK isn't it?
Are you cutting and pasting these commands from my answer? The first one works with no errors on GNU Awk 3.1.5 on RHEL. The second one gives me a warning that | and + will be treated as plain characters and not as special regular expressions. Otherwise, it works.
What machine are you using? What is your OS? Maybe you have an OS like Solaris where things don't work as they would on BSD or Linux.
The OS doesn't matter. He's using gawk, it works just fine. The OP needs to tell us more about his input file and what exactly the problems/errors are he's seeing.

Collectives™ on Stack Overflow

AWK selection of the specified columns

2 Answers 2

Code Demo

7 Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

4 Comments

Related