list the difference and overlap between two plain data set [duplicate]

Question

Possible Duplicate:
Linux tools to treat files as sets and perform set operations on them

I have two data sets, A and B. The format for each data set is one number per line. For instance,

12345 23456 67891 2345900 12345

Some of the data in A are not included in data set B. How to list all of these data in A, and how to list all of those data shared by A and B. How can I do that using Linux/UNIX commands?

Possible duplicate: "Linux tools to treat files as sets and perform set operations on them" — sr_
– sr_, Commented Jan 11, 2012 at 16:23

Tim Kennedy · Accepted Answer · 2012-01-11 16:49:11Z

16

Use the comm command.

If you lists are in files listA and listB:

comm listA listB

By default, comm will return 3 columns. Items only in listA, items only in listB, and items common to both lists.

You can suppress individual columns, with a -1, -2, or -3 arg.

answered Jan 11, 2012 at 16:49

Tim Kennedy

20.2k5 gold badges42 silver badges58 bronze badges

8

The answer assumes listA and listB are already sorted. A more general solution: comm <(sort listA) <(sort listB)

HongboZhu
– HongboZhu

2014-09-19 07:33:25 +00:00
Commented Sep 19, 2014 at 7:33
Very simple solution. Is the comm command deployed in all linux distro?

рüффп
– рüффп

2015-07-06 08:07:52 +00:00
Commented Jul 6, 2015 at 8:07

Add a comment |

neuron34 · Accepted Answer · 2012-01-11 17:29:58Z

This will give you the unique items that exist in A but not in B:

cat A|perl -ne '$z=$_;chomp($z);$y=`grep $z B`;if ($y== "") {print "\n$z";}'|sort -u

This will give you the list of common items in both A and B:

cat A |xargs -i grep {} B|sort -u

Stack Exchange Network

list the difference and overlap between two plain data set [duplicate]

2 Answers 2

Linked

Hot Network Questions

list the difference and overlap between two plain data set [duplicate]

2 Answers 2

Linked

Related

Hot Network Questions