How to compare two columns value in pandas

Question

I Have a dataframe which has some unique IDs in two of the columns.for e.g

S.no. Column1 Column2 1 00001x 00002x 2 00003j 00005k 3 00002x 00001x 4 00004d 00008e

Value can be anything in the string format I want to compare the two column in such a way that either of s.no 1 or 3 data remains. as these id contains the same information. only its order is different.

Basically if for one row value in a column 1 is X and column 2 is Y and for other row value in column 1 is Y and in Column 2 is x then only one of the row should remain.

is that possible in python?

Since you refer to the columns as containing unique IDs, you might want to consider using Pandas MultiIndex. You could then use the sorted tuples from @mozway's answer to index your data. — Erik
– Erik, Commented Sep 15, 2021 at 9:05

Sunderam Dubey · Accepted Answer · 2022-05-30 15:29:04Z

You can convert your columns as frozenset per row.

This will give a common order to apply duplicated.

Finally, slice the rows using the previous output as mask:

mask = df.filter(like='Column').apply(frozenset, axis=1).duplicated() df[~mask]

previous answer using set:

mask = df.filter(like='Column').apply(lambda x: tuple(set(x)), axis=1).duplicated() df[~mask]

NB. Using a set or sorted requires to convert as tuple (lambda x: tuple(sorted(x))) as the duplicated function hashes the values, which is not possible with mutable objects

output:

 S.no. Column1 Column2 0 1 00001x 00002x 1 2 00003j 00005k 3 4 00004d 00008e

Collectives™ on Stack Overflow

How to compare two columns value in pandas

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related