Comparing column values of Python Pandas Dataframe

Question

How to compare a particular column value with rest of the same column values within the same dataframe?

e.g- let a dataframe is df.

df= A B 1 1 2 0 1 0 1 1 2 0

So we have to first take column A, then pick one by one value and compare rest of the A value. Like, I take 1 and compare with rest of the value like [2,1,1,2] and I found 3rd and 4th value is same. So the result should give me for 1 is =

A false true true false

Now pick 2 as it is second element. Output of it will be

A false false false true

basically compare each element with all other elements

This same process will go for column B,C,D....

Would anyone give me any solution how to do it?

What is expected output?

jezrael
– jezrael

2018-09-28 11:17:56 +00:00
Commented Sep 28, 2018 at 11:17 — jezrael
– jezrael, Commented Sep 28, 2018 at 11:17

jezrael · Accepted Answer · 2018-09-28 11:15:37Z

You can use list comprehension with compare all values without actual, which is removed by drop:

df1 = pd.concat([df.drop(i) == x for i, x in enumerate(df.values)], keys=df.index) print (df1) A B 0 1 False False 2 True False 3 True True 4 False False 1 0 False False 2 False True 3 False False 4 True True 2 0 True False 1 False True 3 True False 4 False True 3 0 True True 1 False False 2 True False 4 False False 4 0 False False 1 True True 2 False True 3 False False

Detail:

In list comprehesnion create list of DataFrames:

print ([df.drop(i) == x for i, x in enumerate(df.values)]) [ A B 1 False False 2 True False 3 True True 4 False False, A B 0 False False 2 False True 3 False False 4 True True, A B 0 True False 1 False True 3 True False 4 False True, A B 0 True True 1 False False 2 True False 4 False False, A B 0 False False 1 True True 2 False True 3 False False]

which are joined together by concat and parameter keys for MultiIndex if necessary, then is possible select each small DataFrame by loc:

print (df1.loc[0]) A B 1 False False 2 True False 3 True True 4 False False

Charles R · Accepted Answer · 2018-09-28 10:27:10Z

df_final = pd.DataFrame() # Iterate all columns for column in df.columns.tolist(): # For the iterated column, iterate the line for line in range(len(df[column])): info = "column: " + str(column) + " - line: " + str(line) # Check if the cells below are equals to the iterated cell answer = df.loc[df.index > line,column] == df.loc[df.index == line,column].values[0] # Display the result print(info) print(answer) # Add the result in a dataframe for line in range(len(answer)): df_final = df_final.append([[ info, answer.index[line], answer.values[line] ]]) # Display the resulting dataframe df_final.columns = ["position", "index", "check"] print(df_final)

Collectives™ on Stack Overflow

Comparing column values of Python Pandas Dataframe

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related