Pandas - merge/join/vlookup df and delete all rows that get a match

Question

I am trying to reference a list of expired orders from one spreadsheet(df name = data2), and vlookup them on the new orders spreadsheet (df name = data) to delete all the rows that contain expired orders. Then return a new spreadsheet(df name = results).

I am having trouble trying to mimic what I do in excel vloookup/sort/delete in pandas. Please view psuedo code/steps as code:

Import simple.xls as dataframe called 'data'
Import wo.xlsm, sheet name "T" as dataframe called 'data2'
Do a vlookup , using Column "A" in the "data" to be used to as the values to be matched with any of the same values in Column "A" of "data2" (there both just Order Id's)
For all values that exist inside Column A in 'data2' and also exist in Column "A" of the 'data',group ( if necessary) and delete the entire row(there is 26 columns) for each matched Order ID found in Column A of both datasets. To reiterate, deleting the entire row for the matches found in the 'data' file. Save the smaller dataset as results.

 import pandas as pd data = pd.read_excel("ors_simple.xlsx", encoding = "ISO-8859-1", dtype=object) data2 = pd.read_excel("wos.xlsm", sheet_name = "T") results = data.merge(data2,on='Work_Order') writer = pd.ExcelWriter('vlookuped.xlsx', engine='xlsxwriter') results.to_excel(writer, sheet_name='Sheet1') writer.save()

Which DataFrame contains values that you want to be dropped? data or data2? And do you need to keep the columns from the lookup-table or do you just want to use it to filter your orders? — user3471881
– user3471881, Commented Sep 14, 2018 at 7:54

user3471881 · Accepted Answer · 2018-09-14 11:40:17Z

I re-read your question and think I undertand it correctly. You want to find out if any order in new_orders (you call it data) have expired using expired_orders (you call it data2).

If you rephrase your question what you want to do is: 1) find out if a value in a column in a DataFrame is in a column in another DataFrame and then 2) drop the rows where the value exists in both.

Using pd.merge is one way to do this. But since you want to use expired_orders to filter new_orders, pd.merge seems a bit overkill.

Pandas actually has a method for doing this sort of thing and it's called isin() so let's use that! This method allows you to check if a value in one column exists in another column.

df_1['column_name'].isin(df_2['column_name'])

isin() returns a Series of True/False values that you can apply to filter your DataFrame by using it as a mask: df[bool_mask].

So how do you use this in your situation?

is_expired = new_orders['order_column'].isin(expired_orders['order_column']) results = new_orders[~is_expired].copy() # Use copy to avoid SettingWithCopyError.

~is equal to not - so ~is_expired means that the order wasn't expired.

Collectives™ on Stack Overflow

Pandas - merge/join/vlookup df and delete all rows that get a match

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related