added 97 characters in body

edited Sep 15, 2021 at 9:44

117
6

I have two files. One file is tab separated and has content like

col1. col2 col2 col4 Stef. 123 SE 383 Lena 938 Y X John 738 T Y Stef 827 uq hd Stef 81 tt vv

I have another file with just one column:

837 123 839 827

I want to make a new file which is an interestion of the second column of the first file and the only column of my second text file. But I also want to take into account the first column of the first file.

I know I can do an intersection using:

join <(sort file1) <(sort file2)

But I don't know how to specify the its on the first column of the second file and second column of the first file I want to do the intersection based on the value provided which corresponds to the first column of the first file. For example, I only want to the intersection between the two files if first column of the first file is Stef so the resulting file becomes:

 col1. col2 col2 col4 Stef. 123 SE 383 Stef 827 uq hd

How can I achieve this using bash and awk. I tried doing it in pandas but because my files are very big it takes a long time to load it on my Jupyter notebook. Insights will be appreciated.

My awk script:

awk 'NR==FNR{A[$1];next}$2 in A' file2.txt file1.txt > sample.txt

I have two files. One file is tab separated and has content like

col1. col2 col2 col4 Stef. 123 SE 383 Lena 938 Y X John 738 T Y Stef 827 uq hd Stef 81 tt vv

I have another file with just one column:

837 123 839 827

I want to make a new file which is an interestion of the second column of the first file and the only column of my second text file. But I also want to take into account the first column of the first file.

I know I can do an intersection using:

join <(sort file1) <(sort file2)

But I don't know how to specify the its on the first column of the second file and second column of the first file I want to do the intersection based on the value provided which corresponds to the first column of the first file. For example, I only want to the intersection between the two files if first column of the first file is Stef so the resulting file becomes:

 col1. col2 col2 col4 Stef. 123 SE 383 Stef 827 uq hd

How can I achieve this using bash and awk. I tried doing it in pandas but because my files are very big it takes a long time to load it on my Jupyter notebook. Insights will be appreciated.

I have two files. One file is tab separated and has content like

col1. col2 col2 col4 Stef. 123 SE 383 Lena 938 Y X John 738 T Y Stef 827 uq hd Stef 81 tt vv

I have another file with just one column:

837 123 839 827

I want to make a new file which is an interestion of the second column of the first file and the only column of my second text file. But I also want to take into account the first column of the first file.

I know I can do an intersection using:

join <(sort file1) <(sort file2)

But I don't know how to specify the its on the first column of the second file and second column of the first file I want to do the intersection based on the value provided which corresponds to the first column of the first file. For example, I only want to the intersection between the two files if first column of the first file is Stef so the resulting file becomes:

 col1. col2 col2 col4 Stef. 123 SE 383 Stef 827 uq hd

How can I achieve this using bash and awk. I tried doing it in pandas but because my files are very big it takes a long time to load it on my Jupyter notebook. Insights will be appreciated.

My awk script:

awk 'NR==FNR{A[$1];next}$2 in A' file2.txt file1.txt > sample.txt

nothing to do with Bash, if you're going to use `join`, `sort` or awk anyway

Link

edited Sep 15, 2021 at 9:24

ilkkachu

148k
16
268
441

Intersection of two files based on two columns and one condition using bash

Source Link

asked Sep 15, 2021 at 9:20

John

117
6

Intersection of two files based on two columns and one condition using bash

I have two files. One file is tab separated and has content like

col1. col2 col2 col4 Stef. 123 SE 383 Lena 938 Y X John 738 T Y Stef 827 uq hd Stef 81 tt vv

I have another file with just one column:

837 123 839 827

I want to make a new file which is an interestion of the second column of the first file and the only column of my second text file. But I also want to take into account the first column of the first file.

I know I can do an intersection using:

join <(sort file1) <(sort file2)

But I don't know how to specify the its on the first column of the second file and second column of the first file I want to do the intersection based on the value provided which corresponds to the first column of the first file. For example, I only want to the intersection between the two files if first column of the first file is Stef so the resulting file becomes:

 col1. col2 col2 col4 Stef. 123 SE 383 Stef 827 uq hd

How can I achieve this using bash and awk. I tried doing it in pandas but because my files are very big it takes a long time to load it on my Jupyter notebook. Insights will be appreciated.

bash awk

Stack Exchange Network

Return to Question

Intersection of two files based on two columns and one condition using bash

Intersection of two files based on two columns and one condition using bash