0

I have a thousand files and I need to verify if they all have exactly the same information in the second column up to a certain number of lines. Below an example. I would like to print the names of the files if the first 5 lines of the second columns of the files file1.txt and file2.txt were not equal. In this case the result should show: "difference between files file1.txt and file2.txt"

file1.txt

jose 50 maria 50 fernando 50 andres 50 martin 30 pablo 30 . . . 

file2.txt

julia 50 julio 50 alan 50 ruth 50 ana 40 manuel 40 . . . 
2
  • Are you only going to compare two at a time? Commented Apr 18, 2019 at 20:11
  • No, I want to compare the files file2.txt, ..., file999.txt with the first (file1.txt) Commented Apr 19, 2019 at 4:55

1 Answer 1

0

Hmm. I think I would do a for loop through the files and compare then with comm.

/tmp ❯ comm -3 <(cat file1.txt|awk '{print $2}') <(cat file2.txt|awk '{print $2}') ⏎ 30 30 40 40 

Note the 30's and 40's are output from the files. Some basic usage of comm: comm -1 -3 <(sort -u FILE1.txt) <(sort -u FILE2.txt)

  • -1 suppress lines unique to FILE1
  • -2 suppress lines unique to FILE2
  • -3 suppress lines that appear in both files

So to put all this together something like:

cd /path/to/files && find . -type f -name "*.txt" | while read filename do echo "*** Checking $filename ***"; comm -3 <(cat reference.txt|awk '{print $2}') <(cat $filename|awk '{print $2}'); echo ""; done 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.