2

I got a requirement where I need to compare two files wrt to each columns and write the corresponding difference in another file along with some identification showing mismatched columns. Pointing out the mismatched columns is my main problem statement. For example we have files like:

File 1 1|piyush|bangalore|dev 1|piyush|bangalore|QA 2|pankaj|bangalore|dev 3|rohit|delhi|QA File 2 1|piyush|bangalore|QA 1|piyush|bangalore|QA 2|pankaj|bangalore|dev 3|rohit|bangalore|dev 

The expected output file looks somewhat like.

File 1 1|piyush|bangalore|**dev** File 2 1|piyush|bangalore|**QA** File 1 3|rohit|**delhi**|**QA** File 2 3|rohit|**bangalore**|**dev** 

I want to achieve something like this where i can see the mismatched columns as well along with mismatched rows. I have tried

diff File1 File2 > Diff_File

But this is giving me only the mismatched records or rows. I am not getting any way to point out the mismatched columns as well. Please help me out if its possible to do is using shell script or awk command as i am very new to this. Thanks in advance.

4
  • 2
    Possible duplicate of compare two columns of different files and print if it matches Commented May 8, 2017 at 12:10
  • Hi Yaron, thanks for the link but that is not a duplicate, if you can please read my whole explanation of my problem statement. I need to compare all the columns for each rows and get the diff file where i can point out the unmatched column. Pointing out the unmatched column is my main problem statement Commented May 8, 2017 at 12:16
  • @piyush, there's no 3|rohit|**delhi**|**QA** row within your input files. Update your expected output Commented May 8, 2017 at 15:05
  • Hi @RomaPerekhrest sorry and thanks for pointing out. I have updated my question. Commented May 8, 2017 at 15:20

1 Answer 1

2

Python 3.x solution:

diff_marked.py script:

import sys file1_name = sys.argv[1] file2_name = sys.argv[2] with open(file1_name, 'r') as f1, open(file2_name, 'r') as f2: f1_lines = f1.readlines() # list of lines of File1 f2_lines = f2.readlines() # list of lines of File2 for k,l in enumerate(f1_lines): f1_fields = l.strip().split('|') # splitting a line into fields by separator '|' if k < len(f2_lines) and f2_lines[k]: has_diff = False f2_fields = f2_lines[k].strip().split('|') for i,f in enumerate(f1_fields): if f != f2_fields[i]: # comparing respective lines 'field-by-field' between two files f1_fields[i] = '**' + f + '**' # wrapping differing fields f2_fields[i] = '**' + f2_fields[i] + '**' has_diff = True if has_diff: print(f1.name) # print file name print('|'.join(f1_fields)) print(f2.name) print('|'.join(f2_fields)) 

Usage: (you may have another python version, the current case has been tested on python 3.5)

python3.5 diff_marked.py File1 File2 > diff_output 

diff_output contents:

File1 1|piyush|bangalore|**dev** File2 1|piyush|bangalore|**QA** File1 3|rohit|**delhi**|**QA** File2 3|rohit|**bangalore**|**dev** 
1
  • Thanks RomanPerekhrest for the answer. I was looking for shell script. As of now i am using dwdiff utility to do my work. I will except your answer as this is a good solution for the stated problem. Commented May 16, 2017 at 12:05

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.