2

How to compare a particular column value with rest of the same column values within the same dataframe?

e.g- let a dataframe is df.

df= A B 1 1 2 0 1 0 1 1 2 0 

So we have to first take column A, then pick one by one value and compare rest of the A value. Like, I take 1 and compare with rest of the value like [2,1,1,2] and I found 3rd and 4th value is same. So the result should give me for 1 is =

A false true true false 

Now pick 2 as it is second element. Output of it will be

A false false false true 

basically compare each element with all other elements

This same process will go for column B,C,D....

Would anyone give me any solution how to do it?

1
  • What is expected output? Commented Sep 28, 2018 at 11:17

2 Answers 2

2

You can use list comprehension with compare all values without actual, which is removed by drop:

df1 = pd.concat([df.drop(i) == x for i, x in enumerate(df.values)], keys=df.index) print (df1) A B 0 1 False False 2 True False 3 True True 4 False False 1 0 False False 2 False True 3 False False 4 True True 2 0 True False 1 False True 3 True False 4 False True 3 0 True True 1 False False 2 True False 4 False False 4 0 False False 1 True True 2 False True 3 False False 

Detail:

In list comprehesnion create list of DataFrames:

print ([df.drop(i) == x for i, x in enumerate(df.values)]) [ A B 1 False False 2 True False 3 True True 4 False False, A B 0 False False 2 False True 3 False False 4 True True, A B 0 True False 1 False True 3 True False 4 False True, A B 0 True True 1 False False 2 True False 4 False False, A B 0 False False 1 True True 2 False True 3 False False] 

which are joined together by concat and parameter keys for MultiIndex if necessary, then is possible select each small DataFrame by loc:

print (df1.loc[0]) A B 1 False False 2 True False 3 True True 4 False False 
Sign up to request clarification or add additional context in comments.

Comments

1
df_final = pd.DataFrame() # Iterate all columns for column in df.columns.tolist(): # For the iterated column, iterate the line for line in range(len(df[column])): info = "column: " + str(column) + " - line: " + str(line) # Check if the cells below are equals to the iterated cell answer = df.loc[df.index > line,column] == df.loc[df.index == line,column].values[0] # Display the result print(info) print(answer) # Add the result in a dataframe for line in range(len(answer)): df_final = df_final.append([[ info, answer.index[line], answer.values[line] ]]) # Display the resulting dataframe df_final.columns = ["position", "index", "check"] print(df_final) 

1 Comment

does it fit to your needs ?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.