3

I have 2 text files. Lets name them file1.txt and file2.txt

file1.txt is as follows

chr10 181144 225933 chr10 181243 225933 chr10 181500 225933 chr10 226069 255828 chr10 255989 267134 chr10 255989 282777 chr10 267297 282777 chr10 282856 283524 chr10 283618 285377 chr10 285466 285995 

file2.txt is as follows

chr10 181144 225933 chr10 181243 225933 chr10 181500 225933 chr10 255989 282777 chr10 267297 282777 chr10 282856 283524 chr10 375542 387138 chr10 386930 387138 chr10 387270 390748 chr10 390859 390938 chr10 391051 394580 chr10 394703 395270 

What I want to output in a single file is

  1. All the common lines between file1 and file2
  2. All the lines which are in file1 but are not common to both
  3. All the lines which are in file2 but are not common to both.

I wrote a Perl script to do this but I am pretty sure there must be a command line or an easier way to do it.

2
  • The command to do this is comm. Commented Sep 12, 2014 at 18:20
  • sort -u file1.txt file2.txt is the obvious answer here unless you want the lines in the output to be in that particular order... Commented Feb 25, 2017 at 13:38

2 Answers 2

8

Lines common to both files:

comm -12 file1.txt file2.txt > results.txt 

Add lines unique to file1.txt:

comm -23 file1.txt file2.txt >> results.txt 

Add lines unique to file2.txt:

comm -13 file1.txt file2.txt >> results.txt 

If the files are not already sorted, you must do so beforehand e.g. if your shell supports process substitution

comm -12 <(sort file1.txt) <(sort file2.txt) 

etc.

2
  • 1
    You must sorted two files first. Commented Sep 12, 2014 at 18:23
  • 1
    We also need -u option to prevent duplicated lines. Commented Sep 12, 2014 at 19:14
7

There is a comm command to do this job. But you can do it by combining other standard tools like grep, sort, uniq, join. Here's a solution use grep, with equivalent using comm.

Lines common to both files:

grep -xF -f file1 file2 comm -12 <(sort -u file1) <(sort -u file2) 

Lines only in file1:

grep -vxF -f file2 file1 comm -23 <(sort -u file1) <(sort -u file2) 

Lines only in file2:

grep -vxF -f file1 file2 comm -13 <(sort -u file1) <(sort -u file2) 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.