How to merge two files based on the matching of one column?

Question

I have two files, B.csv:

1,AD 2,AB 3,AC 5,AF 7,AE

and C.csv:

1,x 3,z 5,y

How do I get this output:

1,AD,x 2,AB, 3,AC,z 5,AF,y 7,AE,

by matching the common column 1 in both of the files?

What should be output for a key that exists in B but not C and vice-versa? Include those cases in your example. — Ed Morton
– Ed Morton, Commented Feb 19, 2021 at 22:18

Grynn · Accepted Answer · 2021-02-19 17:14:55Z

4

Use join

join -t, -a1 B.csv C.csv

The -a1 means left outer join (i.e show lines from file1 that are not in file2)

If commas at the end of unpaired lines really matter

(join -t, B.csv C.csv ; join -t, -v1 B.csv C.csv | perl -pe "s/$/,/" ) | sort

edited Feb 19, 2021 at 17:14

answered Feb 19, 2021 at 16:42

Grynn

2111 silver badge4 bronze badges

1

Note that you will need to sort the files first if they are not already sorted.

terdon
– terdon ♦

2021-02-19 16:49:46 +00:00
Commented Feb 19, 2021 at 16:49
@Freddy if you really want comma's at the end of unpaired fields, write a perl script or similar. You can use (join -t, a.csv b.csv ; join -t, -v1 a.csv b.csv | perl -pe "s/$/,/" ) | sort though it's not something I'd recommend

Grynn
– Grynn

2021-02-19 17:11:56 +00:00
Commented Feb 19, 2021 at 17:11
@Grynn I don't care, I'm just nitpicking :)

Freddy
– Freddy

2021-02-19 17:21:11 +00:00
Commented Feb 19, 2021 at 17:21
1

You can get the commas with join -t, -a1 -o 0,1.2,2.2 -e "" B.csv C.csv -- it's annoying that all the output fields must be specified

glenn jackman
– glenn jackman

2021-02-19 17:45:26 +00:00
Commented Feb 19, 2021 at 17:45

Add a comment |

αғsнιη · Accepted Answer · 2021-02-22 09:29:54Z

Using awk without dis-ordering lines of the original files, but require to loading the first file into the memory and you would need to care to don't run on big file that cannot fit into your memory.

awk 'BEGIN { FS=OFS="," } NR==FNR { hold[$1]=$2; next } { print $0, hold[$1] }' fileC fileB

for such a case when there was a key that exist in fileC but not in fileB, and to print those are in fileC as well, do;

awk 'BEGIN { FS=OFS="," } NR==FNR { hold[$1]=$2; next } { print $0, hold[$1]; delete hold[$1] } END{ for(x in hold) print x, hold[x] }' fileC fileB

Stack Exchange Network

How to merge two files based on the matching of one column?

2 Answers 2

You must log in to answer this question.

Hot Network Questions

How to merge two files based on the matching of one column?

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions