Became Hot Network Question

occurred Feb 23, 2021 at 15:50

edited body

edited Feb 23, 2021 at 9:09

529
4
11

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions:

How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.
How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titisiti,toto,lalamumu xixi,saxa siti 254 267 40B lala,jijisisi,jojosojo huhu sisi

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv

but I don't know how to print those who don't match. Do you have an idea?

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions:

How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.
How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv

but I don't know how to print those who don't match. Do you have an idea?

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions:

How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.
How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A siti,toto,mumu xixi,saxa siti 254 267 40B lala,sisi,sojo huhu sisi

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv

but I don't know how to print those who don't match. Do you have an idea?

Change list to "actual" list

Source Link

edited Feb 23, 2021 at 8:22

AdminBee

23.6k
25
55
77

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions : 1) How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv. 2) How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter "s",caps or not) but not present in B.tsv.

How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.

How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv

but I don't know how to print those who don't match. Do you have an idea?

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions : 1) How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv. 2) How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter "s",caps or not) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv

but I don't know how to print those who don't match. Do you have an idea?

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions:

How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.

How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv

but I don't know how to print those who don't match. Do you have an idea?

Source Link

asked Feb 23, 2021 at 7:47

nstatam

529
4
11

How to compare two column of two file and print not matching pattern with awk

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions : 1) How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv. 2) How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter "s",caps or not) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv

but I don't know how to print those who don't match. Do you have an idea?

text-processing awk

Stack Exchange Network

Return to Question

How to compare two column of two file and print not matching pattern with awk