0

My data looks like:

A 4 G 1 G 1 C 4 C 2 C 2 T 6 T 5 T 5 A 6 T 2 T 2 C 6 T 2 T 2 T 6 G 2 G 2 

I am trying the command:

awk -F " " '$1==$3 {$7=$6; print $0;} $1==$5 {$7=$4; print $0;} ($1 != $3 && $1 != $5) {$7=$2; print $0}' test.txt 

While the data has only 5 lines the output has 7 lines and certain lines are randomly duplicated.

Somehow it happens with only this dataset and not the other datasets that I have. Can someone please help. I don't understand what is happening

3
  • 1
    If e.g. both $1==$3 and $1==$5 are true, both the first two blocks run and print. That's the case on lines 2 and 3. The two blocks also both set $7 from two different fields, though those are the same on the two lines it happens here. Commented Mar 3, 2023 at 15:34
  • That helps! Thanks so much. Commented Mar 3, 2023 at 15:53
  • 1
    It's obvious what's causing the problem you asked about (sometimes both of the first 2 conditions are true and you're printing each time) but it's not obvious what you wanted to do instead (e.g. what SHOULD $7 be set to if both of the first 2 conditions are true - $4 or $6 or some concatenation of both or $2 or something else?). If you edit your question to add the expected output for the sample input you provided and state that requirement then we can help you with that. -F " " is useless btw as that's setting FS to the default value it already has, just remove it. Commented Mar 3, 2023 at 16:22

2 Answers 2

1

You didn't describe exactly how you want this to behave - so I'm applying some guesswork here.

Look at the duplicate lines, e.g.

C 4 C 2 C 2 

$1 is the same as $3 so the first block fires. $1 is the same as $5 so the second block fires.

If you only want one line of output per line of input, then only output the data in one place, e.g.

awk -F " " '$1==$3 {$7=$6;} ($1==$5) {$7=$4; } ($1 != $3 && $1 != $5) {$7=$2} ($7 != "") { print $0 }' test.txt 

I think this is the behaviour you are looking for, however it will produce the same or fewer lines of output than input. If you want one line of output for each input line, then remove the condition on the last block.

1
  • It's impossible for $7 to be "" at the line ($7 != "") ... given the logic above that line. Commented Mar 3, 2023 at 16:02
0
awk -F " " '$1==$3 {$7=$6; print $0;} $1==$5 {$7=$4; print $0;} ($1 != $3 && $1 != $5) {$7=$2; print $0}' test.txt 

If e.g. both $1==$3 and $1==$5 are true, both the first two blocks run and print. That's the case on lines 2 and 3. The two blocks also both set $7 from two different fields, though those are the same on the two lines it happens here.

If you want to only ever print each line a maximum of one time, you can set a flag from the branches and print based on that (or not), e.g.:

awk -F " " '{ p=0; } $1==$3 {$7=$6; p=1} $1==$5 {$7=$4; p=1} ($1 != $3 && $1 != $5) {$7=$2; p=1} p {print}' test.txt 

print prints $0 if no other arguments are given, and you can actually use just p without a code block in the end, as the default action is just that.

Similarly, to unconditionally print every line, you often see just a trailing 1, as in awk '/.../ { ... } 1'

You'll have to decide what to do with field $7 though, as those three branches all set them to different values.

If you want to execute only one of the blocks (at most), you can use the next statement in each to go the next line:

awk -F " " '$1==$3 {$7=$6; print; next} $1==$5 {$7=$4; print; next} ($1 != $3 && $1 != $5) {$7=$2; print; next} ' test.txt 

...and actually looking at the conditions, it seems to me the last one is only ever true if and only if the first two ones are false, so we might as well write it all as an if-else:

awk -F " " '{ if ($1==$3) { $7=$6 }; else if ($1==$5) { $7=$4 }; else { $7=$2 }; print; }' test.txt 
3
  • (there's already a state variable in the script - $7) Commented Mar 3, 2023 at 15:42
  • 1
    @symcbean, only if the input only ever contains at most six fields on a line. If there's a ever a seventh field, $7 will already be set even without any of the conditions being true Commented Mar 3, 2023 at 15:44
  • @symcbean, ... and actually looking at the conditions, it seems to me ($1 != $3 && $1 != $5) is only ever true if the first two ones are false so we might as well print unconditionally Commented Mar 3, 2023 at 15:46

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.