Compare values from two pandas data frames, order-independent

Question

I am new to data science. I want to check which elements from one data frame exist in another data frame, e.g.

df1 = [1,2,8,6] df2 = [5,2,6,9] # for 1 output should be False # for 2 output should be True # for 6 output should be True

etc.

Note: I have matrix not vector.

I have tried using the following code:

import pandas as pd import numpy as np priority_dataframe = pd.read_excel(prioritylist_file_path, sheet_name='Sheet1', index=None) priority_dict = {column: np.array(priority_dataframe[column].dropna(axis=0, how='all').str.lower()) for column in priority_dataframe.columns} keys_found_per_sheet = [] if file_path.lower().endswith(('.csv')): file_dataframe = pd.read_csv(file_path) else: file_dataframe = pd.read_excel(file_path, sheet_name=sheet, index=None) file_cell_array = list() for column in file_dataframe.columns: for file_cell in np.array(file_dataframe[column].dropna(axis=0, how='all')): if isinstance(file_cell, str) == 'str': file_cell_array.append(file_cell) else: file_cell_array.append(str(file_cell)) converted_file_cell_array = np.array(file_cell_array) for key, values in priority_dict.items(): for priority_cell in values: if priority_cell in converted_file_cell_array[:]: keys_found_per_sheet.append(key) break

I am doing something wrong in if priority_cell in converted_file_cell_array[:] ?

Is there any other efficient way to do that?

Can you add some data samples and expected output? I think mcve — jezrael
– jezrael, Commented Apr 3, 2018 at 6:04
Possible duplicate of Confirming equality of two pandas dataframes? — Jared Wilber
– Jared Wilber, Commented Apr 3, 2018 at 6:08
@JaredWilber ,not really because, i want to check existence of each element of one data frame into another data frame. — Piyush S. Wanare
– Piyush S. Wanare, Commented Apr 3, 2018 at 6:11
In other words, you want to check if two dataframes have exactly the same elements, but the positions do not matter, right? — DYZ
– DYZ, Commented Apr 3, 2018 at 6:15
My bad. I think you should further clarify the question, I'm still confused what you're asking. — Jared Wilber
– Jared Wilber, Commented Apr 3, 2018 at 6:15

DYZ · Accepted Answer · 2018-04-03 07:00:35Z

2

You can take the .values from each dataframe, convert them to a set(), and take the set intersection.

set1 = set(df1.values.reshape(-1).tolist()) set2 = set(dr2.values.reshape(-1).tolist()) different = set1 & set2

edited Apr 3, 2018 at 7:00

answered Apr 3, 2018 at 6:39

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Piyush S. Wanare Over a year ago

Got error AttributeError: 'builtin_function_or_method' object has no attribute 'reshape'.

DYZ Over a year ago

Is df1 indeed a DataFrame?

Piyush S. Wanare Over a year ago

soryy it was my mistake.

Piyush S. Wanare Over a year ago

I want to know what are the element from set1 exist in set2. Will set difference work, I don't think so.

jezrael Over a year ago

@PiyushS.Wanare - Or use function set.intersection, then second list is not necessary convert to set.

jezrael · Accepted Answer · 2018-04-03 06:59:23Z

You can flatten all values of DataFrames by numpy.ravel and then use set.intersection():

df1 = pd.DataFrame({'A':list('abcdef'), 'B':[4,5,4,5,5,4], 'C':[7,8,9,4,2,3], 'D':[1,3,5,7,1,0], 'E':[5,3,6,9,2,4], 'F':list('aaabbb')}) print (df1) A B C D E F 0 a 4 7 1 5 a 1 b 5 8 3 3 a 2 c 4 9 5 6 a 3 d 5 4 7 9 b 4 e 5 2 1 2 b 5 f 4 3 0 4 b df2 = pd.DataFrame({'A':[2,3,13,4], 'Z':list('abfr')}) print (df2) A Z 0 2 a 1 3 b 2 13 f 3 4 r L = list(set(df1.values.ravel()).intersection(df2.values.ravel())) print (L) ['f', 2, 3, 4, 'a', 'b']

I have already check this, but I want True/False output for each element so that I can do other stuff on that .
@PiyushS.Wanare - What is expected output? dictionary of boolens is correct?
@PiyushS.Wanare - then list(set(a).intersection(b)) should working.

Collectives™ on Stack Overflow

Compare values from two pandas data frames, order-independent

2 Answers 2

5 Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

4 Comments

Linked

Related