Bumped by Community user

occurred Jun 26, 2018 at 16:07

Bumped by Community user

occurred May 26, 2018 at 17:41

Bumped by Community user

occurred Apr 23, 2018 at 10:33

Bumped by Community user

occurred Mar 21, 2018 at 22:29

Bumped by Community user

occurred Feb 11, 2018 at 10:55

Bumped by Community user

occurred Jan 11, 2018 at 9:46

Bumped by Community user

occurred Dec 11, 2017 at 8:07

Bumped by Community user

occurred Nov 8, 2017 at 17:57

Bumped by Community user

occurred Sep 26, 2017 at 3:43

Bumped by Community user

occurred Aug 26, 2017 at 18:13

Bumped by Community user

occurred Jul 27, 2017 at 13:05

Bumped by Community user

occurred Jun 11, 2017 at 0:48

Bumped by Community user

occurred May 2, 2017 at 8:55

clarify question

Source Link

edit approved Mar 14, 2016 at 1:23

dcp1234

27
4

I have two files: File1

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423261 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423263 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305

File2:

ARS-BFGL-BAC-10975 10 21225382 ARS-BFGL-BAC-11025 10 84516867 ARS-BFGL-BAC-11044 1 12805406 ARS-BFGL-BAC-11193 1 29303546

Desired output

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 1 29303546 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 1 29303546

So file 1 has many more rows than file 2. I only want to keep rows in the output that are in file 2 based on column1.

I have tried join but I cant get it to work right-it will tell me my files are not sorted

join -j 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 <(sort -k1 file1) <(sort -k1 file2)

Preferably I would prefer an awk command. File 1 will be very large. I have tried

awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1 > output

Any help would be much appreciated. Thanks

Sorry I cant comment below but just to clarify file not all rows in column1 in file 1 will be in file2.

The awk command

awk 'FNR==NR{a[$1]=$2 FS $3;next} $1 in a {print $0, a[$1]}'

will only keep the amount of rows there is in file 2. But ideally what I want is were for expample ARS-BFGL-10975 is repeated twice (realistically way more) to appear twice in my output.

Thanks for the help so far

I have two files: File1

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423261 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423263 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305

File2:

ARS-BFGL-BAC-10975 10 21225382 ARS-BFGL-BAC-11025 10 84516867 ARS-BFGL-BAC-11044 1 12805406 ARS-BFGL-BAC-11193 1 29303546

Desired output

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 1 29303546 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 1 29303546

So file 1 has many more rows than file 2. I only want to keep rows in the output that are in file 2 based on column1.

I have tried join but I cant get it to work right-it will tell me my files are not sorted

join -j 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 <(sort -k1 file1) <(sort -k1 file2)

Preferably I would prefer an awk command. File 1 will be very large. I have tried

awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1 > output

Any help would be much appreciated. Thanks

I have two files: File1

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423261 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423263 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305

File2:

ARS-BFGL-BAC-10975 10 21225382 ARS-BFGL-BAC-11025 10 84516867 ARS-BFGL-BAC-11193 1 29303546

Desired output

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 1 29303546 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 1 29303546

So file 1 has many more rows than file 2. I only want to keep rows in the output that are in file 2 based on column1.

I have tried join but I cant get it to work right-it will tell me my files are not sorted

join -j 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 <(sort -k1 file1) <(sort -k1 file2)

Preferably I would prefer an awk command. File 1 will be very large. I have tried

awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1 > output

Any help would be much appreciated. Thanks

Sorry I cant comment below but just to clarify file not all rows in column1 in file 1 will be in file2.

The awk command

awk 'FNR==NR{a[$1]=$2 FS $3;next} $1 in a {print $0, a[$1]}'

will only keep the amount of rows there is in file 2. But ideally what I want is were for expample ARS-BFGL-10975 is repeated twice (realistically way more) to appear twice in my output.

Thanks for the help so far

Source Link

asked Mar 13, 2016 at 16:23

user160927

21
1
2

Merge files based on matching of first column

I have two files: File1

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423261 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 ARS-BFGL-BAC-11044 0.9626 688423263 2 01/04/2015 0.9983763305 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305

File2:

ARS-BFGL-BAC-10975 10 21225382 ARS-BFGL-BAC-11025 10 84516867 ARS-BFGL-BAC-11044 1 12805406 ARS-BFGL-BAC-11193 1 29303546

Desired output

ARS-BFGL-BAC-10975 0.9303 688423261 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423261 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423261 1 01/04/2015 0.9983763305 1 29303546 ARS-BFGL-BAC-10975 0.9303 688423263 1 01/04/2015 0.9983763305 10 21225382 ARS-BFGL-BAC-11025 0.9092 688423263 1 01/04/2015 0.9983763305 10 84516867 ARS-BFGL-BAC-11193 0.9544 688423263 1 01/04/2015 0.9983763305 1 29303546

So file 1 has many more rows than file 2. I only want to keep rows in the output that are in file 2 based on column1.

I have tried join but I cant get it to work right-it will tell me my files are not sorted

join -j 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 <(sort -k1 file1) <(sort -k1 file2)

Preferably I would prefer an awk command. File 1 will be very large. I have tried

awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1 > output

Any help would be much appreciated. Thanks

Stack Exchange Network

Return to Question

Merge files based on matching of first column