Skip to main content
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user
Bumped by Community user

I have two files: File1

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423261 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423263 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 

File2:

ARS-BFGL-BAC-10975 10 21225382 ARS-BFGL-BAC-11025 10 84516867 ARS-BFGL-BAC-11044 1 12805406 ARS-BFGL-BAC-11193 1 29303546 

Desired output

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 1 29303546 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 1 29303546 

So file 1 has many more rows than file 2. I only want to keep rows in the output that are in file 2 based on column1.

I have tried join but I cant get it to work right-it will tell me my files are not sorted

join -j 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 <(sort -k1 file1) <(sort -k1 file2) 

Preferably I would prefer an awk command. File 1 will be very large. I have tried

awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1 > output 

Any help would be much appreciated. Thanks

Sorry I cant comment below but just to clarify file not all rows in column1 in file 1 will be in file2.

The awk command

awk 'FNR==NR{a[$1]=$2 FS $3;next} $1 in a {print $0, a[$1]}' 

will only keep the amount of rows there is in file 2. But ideally what I want is were for expample ARS-BFGL-10975 is repeated twice (realistically way more) to appear twice in my output. 

Thanks for the help so far

I have two files: File1

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423261 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423263 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 

File2:

ARS-BFGL-BAC-10975 10 21225382 ARS-BFGL-BAC-11025 10 84516867 ARS-BFGL-BAC-11044 1 12805406 ARS-BFGL-BAC-11193 1 29303546 

Desired output

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 1 29303546 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 1 29303546 

So file 1 has many more rows than file 2. I only want to keep rows in the output that are in file 2 based on column1.

I have tried join but I cant get it to work right-it will tell me my files are not sorted

join -j 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 <(sort -k1 file1) <(sort -k1 file2) 

Preferably I would prefer an awk command. File 1 will be very large. I have tried

awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1 > output 

Any help would be much appreciated. Thanks

I have two files: File1

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423261 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423263 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 

File2:

ARS-BFGL-BAC-10975 10 21225382 ARS-BFGL-BAC-11025 10 84516867 ARS-BFGL-BAC-11193 1 29303546 

Desired output

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 1 29303546 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 1 29303546 

So file 1 has many more rows than file 2. I only want to keep rows in the output that are in file 2 based on column1.

I have tried join but I cant get it to work right-it will tell me my files are not sorted

join -j 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 <(sort -k1 file1) <(sort -k1 file2) 

Preferably I would prefer an awk command. File 1 will be very large. I have tried

awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1 > output 

Any help would be much appreciated. Thanks

Sorry I cant comment below but just to clarify file not all rows in column1 in file 1 will be in file2.

The awk command

awk 'FNR==NR{a[$1]=$2 FS $3;next} $1 in a {print $0, a[$1]}' 

will only keep the amount of rows there is in file 2. But ideally what I want is were for expample ARS-BFGL-10975 is repeated twice (realistically way more) to appear twice in my output. 

Thanks for the help so far

Source Link

Merge files based on matching of first column

I have two files: File1

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423261 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423263 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 

File2:

ARS-BFGL-BAC-10975 10 21225382 ARS-BFGL-BAC-11025 10 84516867 ARS-BFGL-BAC-11044 1 12805406 ARS-BFGL-BAC-11193 1 29303546 

Desired output

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 1 29303546 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 1 29303546 

So file 1 has many more rows than file 2. I only want to keep rows in the output that are in file 2 based on column1.

I have tried join but I cant get it to work right-it will tell me my files are not sorted

join -j 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 <(sort -k1 file1) <(sort -k1 file2) 

Preferably I would prefer an awk command. File 1 will be very large. I have tried

awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1 > output 

Any help would be much appreciated. Thanks