1

I have 2 files,

file1 ->

1 2 2 3 5 

file2 ->

1 3 2 6 

I want to the output to be stored in a 3rd file called file3 as

1,1,Match 2,2,Match 2,,NoMatch 3,3,Match 5,,NoMatch ,6,NoMatch 

I've tried,

sort file1 > file1sorted.txt sort file2 > file2sorted.txt # Combine the sorted files with a comma and store it in a new file paste -d ',' file1sorted.txt file2sorted.txt > mergedsortedfile.txt # Compare the columns and store the result in a new file awk -F',' '{print $1 == $2 ? "MATCH" : "NO MATCH"}' mergedsortedfile.txt > result.txt # Merge the result file with the already existing merged file paste -d ', ' mergedsortedfile.txt result.txt > final_result.txt 

The result appears like this,

1,1,MATCH 2,2,MATCH 2,3,NO MATCH 3,6,NO MATCH 5,,NO MATCH 
1
  • As the line number in both files can be different your approach won't work. I would pass both the files to awk and when the first file is read (can be checked with FILENAME variable) create an array. Then when the second file is read compare its lines with the array contents. Commented Feb 6, 2017 at 7:19

2 Answers 2

1

Using comm on the sorted data:

$ comm <( sort -n file1 ) <( sort -n file2 ) 1 2 2 3 5 6 

This output is tab-delimited. We can mark everything in columns 1 and 2 as "NoMatch" and in column 3 as "Match" with awk:

$ comm <( sort -n file1 ) <( sort -n file2 ) | awk -F$'\t' 'BEGIN { OFS="," } $3 { print $3, $3, "Match"; next } { print $1, $2, "NoMatch" }' 1,1,Match 2,2,Match 2,,NoMatch 3,3,Match 5,,NoMatch ,6,NoMatch 

The awk script will read tab-delimited input (-F$'\t') and use commas for the output field delimiter (OFS=","). If there's something in field 3, then it will output it twice with Match in the third field and continue with the next line. Otherwise, it will output fields 1 and 2 from the input together with NoMatch in the third field.

1
  • Good! If this solves your issue, please consider accepting the answer. Commented Feb 6, 2017 at 11:57
0

Save this perl script as file xxx an run it with perl xxx file1 file2

#!/usr/bin/perl # save the first two files, the <> slurp clears @ARGV ($f1,$f2) = @ARGV; # build a hash of hash of lines from all files, # with the filename as key do { chomp; push @{$hash{$ARGV}}, $_ } while <>; # compare every line until both are empty # the hash slice is a short expression for # $a = $hash{$f1}->[$x] # $b = $hash{$f2}->[$x] for ($x=0;;$x++) { ($a,$b) = map { $$_[$x] } @hash{$f1,$f2}; last unless $a or $b; printf "%s,%s,%s\n", $a, $b, $a eq $b ? 'Match' : 'NoMatch'; } 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.