Skip to main content
added 97 characters in body
Source Link
John
  • 117
  • 6

I have two files. One file is tab separated and has content like

col1. col2 col2 col4 Stef. 123 SE 383 Lena 938 Y X John 738 T Y Stef 827 uq hd Stef 81 tt vv 

I have another file with just one column:

837 123 839 827 

I want to make a new file which is an interestion of the second column of the first file and the only column of my second text file. But I also want to take into account the first column of the first file.

I know I can do an intersection using:

join <(sort file1) <(sort file2) 

But I don't know how to specify the its on the first column of the second file and second column of the first file I want to do the intersection based on the value provided which corresponds to the first column of the first file. For example, I only want to the intersection between the two files if first column of the first file is Stef so the resulting file becomes:

 col1. col2 col2 col4 Stef. 123 SE 383 Stef 827 uq hd 

How can I achieve this using bash and awk. I tried doing it in pandas but because my files are very big it takes a long time to load it on my Jupyter notebook. Insights will be appreciated.

My awk script:

awk 'NR==FNR{A[$1];next}$2 in A' file2.txt file1.txt > sample.txt 

I have two files. One file is tab separated and has content like

col1. col2 col2 col4 Stef. 123 SE 383 Lena 938 Y X John 738 T Y Stef 827 uq hd Stef 81 tt vv 

I have another file with just one column:

837 123 839 827 

I want to make a new file which is an interestion of the second column of the first file and the only column of my second text file. But I also want to take into account the first column of the first file.

I know I can do an intersection using:

join <(sort file1) <(sort file2) 

But I don't know how to specify the its on the first column of the second file and second column of the first file I want to do the intersection based on the value provided which corresponds to the first column of the first file. For example, I only want to the intersection between the two files if first column of the first file is Stef so the resulting file becomes:

 col1. col2 col2 col4 Stef. 123 SE 383 Stef 827 uq hd 

How can I achieve this using bash and awk. I tried doing it in pandas but because my files are very big it takes a long time to load it on my Jupyter notebook. Insights will be appreciated.

I have two files. One file is tab separated and has content like

col1. col2 col2 col4 Stef. 123 SE 383 Lena 938 Y X John 738 T Y Stef 827 uq hd Stef 81 tt vv 

I have another file with just one column:

837 123 839 827 

I want to make a new file which is an interestion of the second column of the first file and the only column of my second text file. But I also want to take into account the first column of the first file.

I know I can do an intersection using:

join <(sort file1) <(sort file2) 

But I don't know how to specify the its on the first column of the second file and second column of the first file I want to do the intersection based on the value provided which corresponds to the first column of the first file. For example, I only want to the intersection between the two files if first column of the first file is Stef so the resulting file becomes:

 col1. col2 col2 col4 Stef. 123 SE 383 Stef 827 uq hd 

How can I achieve this using bash and awk. I tried doing it in pandas but because my files are very big it takes a long time to load it on my Jupyter notebook. Insights will be appreciated.

My awk script:

awk 'NR==FNR{A[$1];next}$2 in A' file2.txt file1.txt > sample.txt 
nothing to do with Bash, if you're going to use `join`, `sort` or awk anyway
Link
ilkkachu
  • 148k
  • 16
  • 268
  • 441

Intersection of two files based on two columns and one condition using bash

Source Link
John
  • 117
  • 6

Intersection of two files based on two columns and one condition using bash

I have two files. One file is tab separated and has content like

col1. col2 col2 col4 Stef. 123 SE 383 Lena 938 Y X John 738 T Y Stef 827 uq hd Stef 81 tt vv 

I have another file with just one column:

837 123 839 827 

I want to make a new file which is an interestion of the second column of the first file and the only column of my second text file. But I also want to take into account the first column of the first file.

I know I can do an intersection using:

join <(sort file1) <(sort file2) 

But I don't know how to specify the its on the first column of the second file and second column of the first file I want to do the intersection based on the value provided which corresponds to the first column of the first file. For example, I only want to the intersection between the two files if first column of the first file is Stef so the resulting file becomes:

 col1. col2 col2 col4 Stef. 123 SE 383 Stef 827 uq hd 

How can I achieve this using bash and awk. I tried doing it in pandas but because my files are very big it takes a long time to load it on my Jupyter notebook. Insights will be appreciated.