2

I have two dataframes like this,

>>df1 name key1 key2 A B 0 a1 1 K0 A0 B0 1 a2 2 K1 A1 B1 2 a3 3 K0 A2 B2 3 a3 4 K1 A3 B3 >>df2 key1 key2 0 1 K0 1 2 K0 2 3 0K 3 4 1K 

I need to compare key1,key2 of df1 with df2 and I have to print the matching rows. While comparing I have to check df1['key1','key2'] == df2['key1','key2'] or df1['key1','key2'] == df2['key1',reverse('key2')]

Expected Output:

>>df3 name key1 key2 A B a1 1 K0 A0 B0 a3 3 K0 A2 B2 a3 4 K1 A3 B3 >>df4 name key1 key2 A B a1 1 K0 A0 B0 a3 3,4 K0,K1 A2,A3 B2,B3 
1
  • kindly post your expected output. you can also include what you tried Commented Sep 4, 2021 at 10:13

3 Answers 3

2

Try:

x = df1.merge(df2, on=["key1", "key2"]) df2["key2"] = df2["key2"].str[::-1] y = df1.merge(df2, on=["key1", "key2"]) df3 = pd.concat([x, y]) df4 = ( df3.assign(key1=df3.key1.astype(str)) .groupby("name", as_index=False) .agg(", ".join) ) print(df3) print(df4) 

Prints:

 name key1 key2 A B 0 a1 1 K0 A0 B0 0 a3 3 K0 A2 B2 1 a3 4 K1 A3 B3 name key1 key2 A B 0 a1 1 K0 A0 B0 1 a3 3, 4 K0, K1 A2, A3 B2, B3 
Sign up to request clarification or add additional context in comments.

Comments

2

Here is another approach:

  1. Get ord for each character in key2 and sum them up to create a helper column.

  2. Then use this column in merge. This will eliminate the need of reversing the string.


f = lambda x: sum(map(ord,x)) df4 = (df1.merge(df2,left_on=['key1',df1['key2'].map(f)], right_on=['key1',df2['key2'].map(f)],suffixes=('','_y')) .loc[:,df1.columns] .groupby("name", as_index=False).agg(lambda x: ', '.join(x.map(str)))) 

print(df4) name key1 key2 A B 0 a1 1 K0 A0 B0 1 a3 3, 4 K0, K1 A2, A3 B2, B3 

Note that you receive df3 if you remove the groupby operation from the code above.

Comments

1

Let us try with MultiIndex.isin

i1 = df1.set_index(['key1', 'key2']).index i2 = df2.set_index(['key1', 'key2']).index i3 = df2.set_index(['key1', df2['key2'].str[::-1]]).index df3 = df1[i1.isin(i2.union(i3))] df4 = df3.astype(str).groupby('name', as_index=False).agg(','.join) 

print(df3) name key1 key2 A B 0 a1 1 K0 A0 B0 2 a3 3 K0 A2 B2 3 a3 4 K1 A3 B3 print(df4) name key1 key2 A B 0 a1 1 K0 A0 B0 1 a3 3,4 K0,K1 A2,A3 B2,B3 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.