I need to do a fuzzy groupby where a single record can be in one or more groups.
I have a DataFrame like this:
test = pd.DataFrame({'score1' : pandas.Series(['a', 'b', 'c', 'd', 'e']), 'score2' : pd.Series(['b', 'a', 'k', 'n', 'c'])}) Output:
score1 score2 0 a b 1 b a 2 c k 3 d n 4 e c I wish to have groups like this: 
The group keys should be the union of the unique values between score1 and score2. Record 0 should be in groups a and b because it contains both score values. Similarly record 1 should be in groups b and a; record 2 should be in groups c and k and so on.
I've tried doing a groupby on two columns like this:
In [192]: score_groups = pd.groupby(['score1', 'score2']) However I get the group keys as tuples - (1, 2), (2, 1), (3, 8), etc, instead of unique group keys where records can be in multiple groups. The output is shown below:
In [192]: score_groups.groups Out[192]: {('a', 'b'): [0], ('b', 'a'): [1], ('c', 'k'): [2], ('d', 'n'): [3], ('e', 'c'): [4]} Also, I need the indexes preserved because I'm using them for another operation later. Please help!