Skip to main content
Became Hot Network Question
edited body
Source Link
nstatam
  • 529
  • 4
  • 11

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo 

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala 

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions:

  1. How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.
  2. How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titisiti,toto,lalamumu xixi,saxa siti 254 267 40B lala,jijisisi,jojosojo huhu sisi 

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv 

but I don't know how to print those who don't match. Do you have an idea?

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo 

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala 

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions:

  1. How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.
  2. How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi 

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv 

but I don't know how to print those who don't match. Do you have an idea?

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo 

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala 

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions:

  1. How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.
  2. How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A siti,toto,mumu xixi,saxa siti 254 267 40B lala,sisi,sojo huhu sisi 

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv 

but I don't know how to print those who don't match. Do you have an idea?

Change list to "actual" list
Source Link
AdminBee
  • 23.6k
  • 25
  • 55
  • 77

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo 

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala 

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions  : 1) How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv. 2) How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter "s",caps or not) but not present in B.tsv.

  1. How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.
  2. How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi 

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv 

but I don't know how to print those who don't match. Do you have an idea?

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo 

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala 

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions  : 1) How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv. 2) How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter "s",caps or not) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi 

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv 

but I don't know how to print those who don't match. Do you have an idea?

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo 

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala 

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions:

  1. How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv.
  2. How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter s, case-insensitive) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi 

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv 

but I don't know how to print those who don't match. Do you have an idea?

Source Link
nstatam
  • 529
  • 4
  • 11

How to compare two column of two file and print not matching pattern with awk

I have a data file A.tsv (field separator = \t) :

id clade mutation 243 40A siti,toto,mumu 254 267 40B lala,sisi,sojo 

and a template file B.tsv (field separator = \t) :

40A toto,xixi,saxa 40B lala,sojo,huhu 40C sasa,sisi,lala 

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv. I have two questions : 1) How to indicate in a new column named "missing_mutation" the name of the mutation in B.tsv that aren't present in A.tsv. 2) How to indicate in a new column named "remaining_mutation" the name of the mutation that are present in A.tsv (and that start with the letter "s",caps or not) but not present in B.tsv.

C.tsv looks like this:

id clade mutation missing_mutation remaining_mutation 243 40A titi,toto,lala xixi,saxa siti 254 267 40B lala,jiji,jojo huhu sisi 

I know how to compare two files like this:

awk -F"," -vOFS="," ' NR==FNR { a[$2]=$3; next } { print $0,a[$2] } ' B.tsv A.tsv > C.tsv 

but I don't know how to print those who don't match. Do you have an idea?