0

I have a text file that contains a mix of different number of columns per line.

I want to only print the lines if columns 3, 4 and 5 of that line only contains number.

The trick is occasionally columns 3, 4 and 5 will have a special character "(", or ")" embedded in them, and I want to print these numbers too.

cat $filename | awk '{ if ( ($3 != "^[0-9]") && ($4 != "^[0-9]") && ($5 != "^[0-9]") ) print $2, $3, $4, $5 }' >>text.dat 

But it also prints such thing as: Au2, Cu2, etc.

Any suggestions?

UPDATE:

A relevant part of input text file looks like this:

Cu1 Cu 0.00000 0.094635(14) 0.094635(14) Cu2 Cu 0.00000 0.125943(15) 0.125943(15) . . . 

What I want is the following:

Cu 0.00000 0.094635 0.094635 Cu 0.00000 0.125943 0.125943 . . . 

Note that "Cu" is from the string in second column from the original input file, and I've gotten rid of the number and parentheses in columns 4 and 5. Note also that the parentheses could exist in column 3 as well. Numbers in the parentheses could be single digit.

1 Answer 1

1

in your codes:

 ($3 != "^[0-9]") && ($4 != "^[0-9]") && ($5 != "^[0-9]") 

!= means not equal to it doesn't do regex match testing.

try $3~/[0-9]+/ && $4~/[0-9]+/ and so on

for the ( or ) problem what you could do is, before you check regex match on $2 $3 $4, replace all ( or ) in those fields with "" then do the match testing.

I hope the explanation above is clear enough.

EDIT

awk '{for(i=3;i<=5;i++)gsub(/\([^\)]*\)/,"",$i)}$3~/[0-9\.]*/&&$4~/[0-9\.]*/&&$5~/[0-9\.]*/' file 

this line above does:

  • remove (...) from $3,$4, $5
  • check if $3, $4, $5 are number (or decimal).
  • if yes, print line out

with your example:

kent$ echo "Cu1 Cu 0.00000 0.094635(14) 0.094635(14) Cu2 Cu 0.00000 0.125943(15) 0.125943(15)"|awk '{for(i=3;i<=5;i++)gsub(/\([^\)]*\)/,"",$i)}$3~/[0-9\.]*/&&$4~/[0-9\.]*/&&$5~/[0-9\.]*/' Cu1 Cu 0.00000 0.094635 0.094635 Cu2 Cu 0.00000 0.125943 0.125943 

only $2, $3, $4, $5:

awk '{for(i=3;i<=5;i++)gsub(/\([^\)]*\)/,"",$i);if($3~/[0-9\.]*/&&$4~/[0-9\.]*/&&$5~/[0-9\.]*/)print $2,$3,$4,$5}' file 
Sign up to request clarification or add additional context in comments.

3 Comments

still prints lines which has a column with a combination of alphabet and number...
@Greg if you gave some input/output example, it will help you to get an answer sooner.
Thanks. but I do not want the first column printed. how do I do that?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.