1

I have two files, B.csv:

1,AD 2,AB 3,AC 5,AF 7,AE 

and C.csv:

1,x 3,z 5,y 

How do I get this output:

1,AD,x 2,AB, 3,AC,z 5,AF,y 7,AE, 

by matching the common column 1 in both of the files?

1
  • What should be output for a key that exists in B but not C and vice-versa? Include those cases in your example. Commented Feb 19, 2021 at 22:18

2 Answers 2

4

Use join

join -t, -a1 B.csv C.csv 

The -a1 means left outer join (i.e show lines from file1 that are not in file2)

If commas at the end of unpaired lines really matter

(join -t, B.csv C.csv ; join -t, -v1 B.csv C.csv | perl -pe "s/$/,/" ) | sort 
4
  • 1
    Note that you will need to sort the files first if they are not already sorted. Commented Feb 19, 2021 at 16:49
  • @Freddy if you really want comma's at the end of unpaired fields, write a perl script or similar. You can use (join -t, a.csv b.csv ; join -t, -v1 a.csv b.csv | perl -pe "s/$/,/" ) | sort though it's not something I'd recommend Commented Feb 19, 2021 at 17:11
  • @Grynn I don't care, I'm just nitpicking :) Commented Feb 19, 2021 at 17:21
  • 1
    You can get the commas with join -t, -a1 -o 0,1.2,2.2 -e "" B.csv C.csv -- it's annoying that all the output fields must be specified Commented Feb 19, 2021 at 17:45
3

Using awk without dis-ordering lines of the original files, but require to loading the first file into the memory and you would need to care to don't run on big file that cannot fit into your memory.

awk 'BEGIN { FS=OFS="," } NR==FNR { hold[$1]=$2; next } { print $0, hold[$1] }' fileC fileB 

for such a case when there was a key that exist in fileC but not in fileB, and to print those are in fileC as well, do;

awk 'BEGIN { FS=OFS="," } NR==FNR { hold[$1]=$2; next } { print $0, hold[$1]; delete hold[$1] } END{ for(x in hold) print x, hold[x] }' fileC fileB 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.