0

I have a dataframe like this:

Name Flag part1 part2 part3 part4 part5 part6
Company1 Y Paper Machinery IT Machinery None None
Company2 N IT Business Paper None None None
Company3 N Bio Bio Trades None None None
Company4 N Air Communication Oil Oil Oil None
Company5 Y Business Oil Air Food None None
Company6 N Food Business Paper Bio Air Paper

I need to get a new column "Result" where I compare all values in columns part1 - part6 and if text in any two columns are identical - the result is true and vice versa. It has to be like this:

Name Flag part1 part2 part3 part4 part5 part6 Result
Company1 Y Paper Machinery IT Machinery None None True
Company2 N IT Business Paper None None None False
Company3 N Bio Bio Trades None None None True
Company4 N Air Communication Oil Oil Oil None True
Company5 Y Business Oil Air Food None None False
Company6 N Food Business Paper Bio Air Paper True

Is there any simple way to do it? I tried something like this:

df['Result'] = (df['part1']==df['part2']) | (df['part1']==df['part3']) | (df['part1']==df['part4']) | (df['part1']==df['part5']) | (df['part2']==df['part3']) | (df['part2']==df['part4']) | (df['part2']==df['part5']) |(df['part3']==df['part4']) | (df['part3']==df['part5']) | (df['part4']==df['part5']) 

But this way is too weird and uncomfortable, I believe that it has a better solution. (In my task I have to compare 21 columns)

2
  • 1
    Which one is part1_ind? Or it's typo? Commented Jul 14, 2021 at 17:38
  • @Kshitiz sorry I edited it, it was a mistake:) Commented Jul 14, 2021 at 17:41

2 Answers 2

3

In your case try

df['out'] = df.filter(like='part').apply(lambda x : x[x!='None'].duplicated().any(),1) Out[24]: 0 True 1 False 2 True 3 True 4 False 5 True dtype: bool 
Sign up to request clarification or add additional context in comments.

3 Comments

I tried but it shows "True" in every case :(
@AnnaShevtsova What do you mean by True in every case? Probably thats because you have duplicates in all rows. This code here works as expected
@Onyambu I can`t attach the image. But it is really strange cause it must work!
0

This answer was written against a previous version of the question in which part1 was compared with all other columns, rather than a search any duplicate part# values.

Do an index-based equality comparison, then reduce with any across rows (by specifying columns, makes sense-ish, but is less than intuitive).

>>> df.filter(regex=r'part[2-9]').eq(df['part1'], axis='index').any(axis='columns') 0 True 1 False 2 True 3 True 4 False 5 True dtype: bool 

Note that I use filter to quickly select the part2 ... part6 columns, this could be specified manually as well. You must pass axis='index' and axis='columns' (or the corresponding int) to both df.eq and df.any respectively to do the comparison and reduction properly.

4 Comments

I edited a frame for better understanding, I need to compare all columns because identical text can be in 5 and 6 columns and it will be true)
Can you clarify as to how your logic changed? There isn't any difference in outputs here.
I see your edit now, what you're doing is really looking for duplicates: BENY's answer above then would be the best way to do that. There isn't a Series-wise duplicate check attached to DataFrame, so you'll have to apply that duplicate check over rows. (Note also that because None or np.nan repetitions count as duplicates, you need to filter them out.)
I tried this solution but it shows "True" in every case :(

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.