Pandas compare many columns with each other and if any two are identical - true

Question

I have a dataframe like this:

Name	Flag	part1	part2	part3	part4	part5	part6
Company1	Y	Paper	Machinery	IT	Machinery	None	None
Company2	N	IT	Business	Paper	None	None	None
Company3	N	Bio	Bio	Trades	None	None	None
Company4	N	Air	Communication	Oil	Oil	Oil	None
Company5	Y	Business	Oil	Air	Food	None	None
Company6	N	Food	Business	Paper	Bio	Air	Paper

I need to get a new column "Result" where I compare all values in columns part1 - part6 and if text in any two columns are identical - the result is true and vice versa. It has to be like this:

Name	Flag	part1	part2	part3	part4	part5	part6	Result
Company1	Y	Paper	Machinery	IT	Machinery	None	None	True
Company2	N	IT	Business	Paper	None	None	None	False
Company3	N	Bio	Bio	Trades	None	None	None	True
Company4	N	Air	Communication	Oil	Oil	Oil	None	True
Company5	Y	Business	Oil	Air	Food	None	None	False
Company6	N	Food	Business	Paper	Bio	Air	Paper	True

Is there any simple way to do it? I tried something like this:

df['Result'] = (df['part1']==df['part2']) | (df['part1']==df['part3']) | (df['part1']==df['part4']) | (df['part1']==df['part5']) | (df['part2']==df['part3']) | (df['part2']==df['part4']) | (df['part2']==df['part5']) |(df['part3']==df['part4']) | (df['part3']==df['part5']) | (df['part4']==df['part5'])

But this way is too weird and uncomfortable, I believe that it has a better solution. (In my task I have to compare 21 columns)

Which one is part1_ind? Or it's typo?

imxitiz
– imxitiz

2021-07-14 17:38:55 +00:00
Commented Jul 14, 2021 at 17:38 — imxitiz
– imxitiz, Commented Jul 14, 2021 at 17:38
@Kshitiz sorry I edited it, it was a mistake:)

Anna Shevtsova
– Anna Shevtsova

2021-07-14 17:41:13 +00:00
Commented Jul 14, 2021 at 17:41 — Anna Shevtsova
– Anna Shevtsova, Commented Jul 14, 2021 at 17:41

BENY · Accepted Answer · 2021-07-14 17:45:05Z

3

In your case try

df['out'] = df.filter(like='part').apply(lambda x : x[x!='None'].duplicated().any(),1) Out[24]: 0 True 1 False 2 True 3 True 4 False 5 True dtype: bool

answered Jul 14, 2021 at 17:45

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Anna Shevtsova Over a year ago

I tried but it shows "True" in every case :(

Onyambu Over a year ago

@AnnaShevtsova What do you mean by True in every case? Probably thats because you have duplicates in all rows. This code here works as expected

Anna Shevtsova Over a year ago

@Onyambu I can`t attach the image. But it is really strange cause it must work!

ifly6 · Accepted Answer · 2021-07-14 18:11:28Z

This answer was written against a previous version of the question in which part1 was compared with all other columns, rather than a search any duplicate part# values.

Do an index-based equality comparison, then reduce with any across rows (by specifying columns, makes sense-ish, but is less than intuitive).

>>> df.filter(regex=r'part[2-9]').eq(df['part1'], axis='index').any(axis='columns') 0 True 1 False 2 True 3 True 4 False 5 True dtype: bool

Note that I use filter to quickly select the part2 ... part6 columns, this could be specified manually as well. You must pass axis='index' and axis='columns' (or the corresponding int) to both df.eq and df.any respectively to do the comparison and reduction properly.

I edited a frame for better understanding, I need to compare all columns because identical text can be in 5 and 6 columns and it will be true)
Can you clarify as to how your logic changed? There isn't any difference in outputs here.
I see your edit now, what you're doing is really looking for duplicates: BENY's answer above then would be the best way to do that. There isn't a Series-wise duplicate check attached to DataFrame, so you'll have to apply that duplicate check over rows. (Note also that because None or np.nan repetitions count as duplicates, you need to filter them out.)

Collectives™ on Stack Overflow

Pandas compare many columns with each other and if any two are identical - true

2 Answers 2

3 Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Related