Comparing two files and writing mismatched rows along with mismatched columns. Pointing out the mismatched columns is my main problem statement

Question

I got a requirement where I need to compare two files wrt to each columns and write the corresponding difference in another file along with some identification showing mismatched columns. Pointing out the mismatched columns is my main problem statement. For example we have files like:

File 1 1|piyush|bangalore|dev 1|piyush|bangalore|QA 2|pankaj|bangalore|dev 3|rohit|delhi|QA File 2 1|piyush|bangalore|QA 1|piyush|bangalore|QA 2|pankaj|bangalore|dev 3|rohit|bangalore|dev

The expected output file looks somewhat like.

File 1 1|piyush|bangalore|**dev** File 2 1|piyush|bangalore|**QA** File 1 3|rohit|**delhi**|**QA** File 2 3|rohit|**bangalore**|**dev**

I want to achieve something like this where i can see the mismatched columns as well along with mismatched rows. I have tried

diff File1 File2 > Diff_File

But this is giving me only the mismatched records or rows. I am not getting any way to point out the mismatched columns as well. Please help me out if its possible to do is using shell script or awk command as i am very new to this. Thanks in advance.

Possible duplicate of compare two columns of different files and print if it matches — Yaron
– Yaron, Commented May 8, 2017 at 12:10
Hi Yaron, thanks for the link but that is not a duplicate, if you can please read my whole explanation of my problem statement. I need to compare all the columns for each rows and get the diff file where i can point out the unmatched column. Pointing out the unmatched column is my main problem statement — piyush
– piyush, Commented May 8, 2017 at 12:16
@piyush, there's no 3|rohit|**delhi**|**QA** row within your input files. Update your expected output — RomanPerekhrest
– RomanPerekhrest, Commented May 8, 2017 at 15:05
Hi @RomaPerekhrest sorry and thanks for pointing out. I have updated my question. — piyush
– piyush, Commented May 8, 2017 at 15:20

RomanPerekhrest · Accepted Answer · 2017-05-08 19:24:35Z

Python 3.x solution:

diff_marked.py script:

import sys file1_name = sys.argv[1] file2_name = sys.argv[2] with open(file1_name, 'r') as f1, open(file2_name, 'r') as f2: f1_lines = f1.readlines() # list of lines of File1 f2_lines = f2.readlines() # list of lines of File2 for k,l in enumerate(f1_lines): f1_fields = l.strip().split('|') # splitting a line into fields by separator '|' if k < len(f2_lines) and f2_lines[k]: has_diff = False f2_fields = f2_lines[k].strip().split('|') for i,f in enumerate(f1_fields): if f != f2_fields[i]: # comparing respective lines 'field-by-field' between two files f1_fields[i] = '**' + f + '**' # wrapping differing fields f2_fields[i] = '**' + f2_fields[i] + '**' has_diff = True if has_diff: print(f1.name) # print file name print('|'.join(f1_fields)) print(f2.name) print('|'.join(f2_fields))

Usage: (you may have another python version, the current case has been tested on python 3.5)

python3.5 diff_marked.py File1 File2 > diff_output

diff_output contents:

File1 1|piyush|bangalore|**dev** File2 1|piyush|bangalore|**QA** File1 3|rohit|**delhi**|**QA** File2 3|rohit|**bangalore**|**dev**

Thanks RomanPerekhrest for the answer. I was looking for shell script. As of now i am using dwdiff utility to do my work. I will except your answer as this is a good solution for the stated problem. — piyush
– piyush, Commented May 16, 2017 at 12:05

Stack Exchange Network

Comparing two files and writing mismatched rows along with mismatched columns. Pointing out the mismatched columns is my main problem statement

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Comparing two files and writing mismatched rows along with mismatched columns. Pointing out the mismatched columns is my main problem statement

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions