2

I have a question regarding comparison of multiple lists. I have a "master list" and 5 sublists. Some of the items in the 5 sublists are identical, and not all of them match the ones in the master list. I know which of these are in each, however the master list is large. This might be kind of confusing, but I need to identify the overlaps in these sublists to mark for different colors in networkx.

My code right now: (and it doesn't work)

master = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] s1 = [ 1, 2, 3] s2 = [1, 3, 4] s3 = [1, 2, 6] s4 = [2, 3, 4] colors = [] for m in masterlist: if m in s1 and m in s2 and m in s3: colors.append('magenta') elif m in s1 and m in s3 and m in s4: colors.append('blue') elif m in s1 and m in s2 and m in s4: colors.append('green') elif m in s1 and m in s3: colors.append('cyan') elif m in s2 and m in s4: colors.append('tan') elif m in s1: colors.append('aquamarine') elif m in s2: colors.append('gold') elif m in s3: colors.append('yellow') elif m in s4: colors.append('black') else: colors.append('gray') print colors 

Desired output:

['gray', 'magenta', 'blue', 'green', 'tan', 'gray', 'yellow', 'gray', 'gray', 'gray', 'gray'] 

I noticed that the points where it doesn't work are the lines with two AND statements. Does anyone know how I should change this? Should I use something like 'contains'?

I also need to know where the overlaps occur, by color. So I've been using a count method for the colors list:

print "s1-2-3 overlaps:", colors.count('magenta') print "s1-3-4 overlaps:", colors.count('blue') print "s1-2-4 overlaps:", colors.count('green') print "s1 unique:", colors.count('aquamarine') ... 

The output I need based on the example above is a list with color strings. If a item in the master list is contained in all 5 sublists, I need a color name to be in the same position in the color list as the master list. Then for all remaining items in the sublists, I need a different color appended to the color list for each one, with all other items in the master list not matching any sublists, to be colored the same color. Again, this is for a networkx graph. So the colors will correspond to nodes.

I will be doing this 30+ times making many graphs, so I need the colors for matches within the elif statements to remain the same, so that they are same for each. Sometimes items in sublists will match, sometimes they won't.

9
  • Could you elaborate on what types of values are in your lists? Strings? Integers? Etc? Commented Jul 14, 2014 at 20:43
  • 1
    Random suggestion: since you keep testing m in s1 and such over and over, maybe evaluate each of them once, assign them to variables (m1 = m in s1), and then just test those booleans. Might not solve the problem but it should speed up your code. Commented Jul 14, 2014 at 20:44
  • 2
    I would also highly recommend looking into using set for this, specifically the intersection method. Assuming of course you don't have/care about duplicates. Commented Jul 14, 2014 at 20:46
  • they are strings of amino acid sequences. Commented Jul 14, 2014 at 20:47
  • 1
    You can simply chain intersections. I.E. a.intersection(b).intersection(c) will give you the elements that are in a, b, and c. Or you can pass multiple iterators to intersection to get the same effect. Commented Jul 14, 2014 at 20:48

3 Answers 3

2

Given the information in your comments, I think something like this is what you are after:

master = {'A', 'B', 'C', 'D', 'E'} s1 = {'A', 'B', 'E'} s2 = {'B', 'D', 'E'} s3 = {'E', 'A', 'C'} >>> master.intersection(s1, s2, s3) {'E'} >>> master.intersection(s1) {'A', 'B', 'E'} >>> master.intersection(s1, s2) {'B', 'E'} 

And so on. You should be able to derive how to append the intersections from this pretty easily.

Unless you're looking for specific overlaps, I.E. sublists. In which case @stark's answer is probably more useful, however you may be able to accomplish this using set's subset or superset functionality as well.

UPDATE 1

Example of using supersets (obviously not the extensive set, but should get you going in the right direction):

masters = [{'A', 'B', 'C'},{'A', 'C'}, {'B', 'C'}, {'A', 'B', 'C', 'E'}] s1 = {'A', 'E'} s2 = {'B', 'C'} s3 = {'A', 'C', 'E'} colors = [] for mlist in masters: if mlist.issuperset(s1) and mlist.issuperset(s2) and mlist.issuperset(s3): colors.append('magenta') elif mlist.issuperset(s1) and mlist.issuperset(s3): colors.append('blue') elif mlist.issuperset(s2) and mlist.issuperset(s3): colors.append('green') elif mlist.issuperset(s1): colors.append('green') elif mlist.issuperset(s2): colors.append('gold') elif mlist.issuperset(s3): colors.append('yellow') else: colors.append('grey') >>> colors ['gold', 'grey', 'gold', 'magenta'] 

UPDATE 2

Based on your further explanation, I think that this is what you are looking for:

master = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] s1 = { 1, 2, 3} s2 = {1, 3, 4} s3 = {1, 2, 6} s4 = {2, 3, 4} # A color key is (in_s1, in_s2, in_s3, in_s4) color_map = {(True, True, True, False) :'magenta', (True, False, True, True) :'blue', (True, True, False, True) :'green', (True, False, True, False) :'cyan', (False, True, False, True) :'tan', (True, False, False, False):'aquamarine', (False, True, False, False):'gold', (False, False, True, False):'yellow', (False, False, False, True):'black', (False, False, False, False):'grey'} def color_key(element): return element in s1, element in s2, element in s3, element in s4 def color_list(in_list): return [color_map[color_key(element)] for element in in_list] >>> color_list(master) ['grey', 'magenta', 'blue', 'green', 'tan', 'grey', 'yellow', 'grey', 'grey', 'grey', 'grey'] 

You can further enumerate the permutations (there are 2^num_s of them) for more colors if you would like. Note that the sN's are sets for speed, but could be lists if you need duplicate values (though since you're only searching single values, I don't know why you would). This is basically the bitmap or truth table method, though expanded to be a little more explicit as to what it is doing.

Sign up to request clarification or add additional context in comments.

7 Comments

You can also use the > operator instead of the issuperset method: if mlist > s1 and mlist > s2 and mlist > s3.
Ok, I think your code will work. Thank you. Why do you use {} though? Are they not reserved for dictionaries?
@user3358205 {value} can be used to initialize set literals as well, although not an empty set ({} is always an empty dict).
I get an error saying: "AttributeError: 'str' object has no attribute 'issuperset'" Does it matter if the 'master' is in list format ( surrounded by [] ), and the 's' lists are as well? I don't have it broken up like your answer does.
@user3358205 yes, it does. The issuperset method is a member of set, not list. You'll have to cast them to set as you check. But note that sets are unordered, and obviously don'y allow duplicates. If order matters, or you have duplicates that matter, then set is not the tool for you.
|
1

At its simplest, "iterable" means that it supports the machinery used by, among other things, for and while loops:

for x in foo: # foo is an iterable ... 

You can make your own objects iterable if you wish; this allows you to use them in for or while loops, as arguments to methods like map() or in many other places. More info https://docs.python.org/2/library/stdtypes.html

Comments

0

Maybe use a lookup table instead

colorlist = ('gray', 'aquamarine', 'gold', 'gold', black' ...etc. bitmap = 0 if m in s1: bitmap += 1 if m in s2: bitmap += 2 if m in s3: bitmap += 4 colors.append(colorlist[bitmap]) count[bitmap]++ 

Fill in the lookup table with all 32 colors for 5 variables. count will have the number of values for each combination.

2 Comments

Can you elaborate a little more please, stark? The colors need to be strings to pass to networkx. Would I just replace the int values in colors to color names? I also need to know the original lists that they came from. See edits to original question
Updated with answers.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.