compare a two date columns of a data frame with another two data frames of second data frame in python

Question

I have two dataframes df1 and df2 df1 contains month and two date columns

df1

Month Month_Start Month_End Month1 2022-03-27 2022-04-30 Month2 2022-05-01 2022-05-28 Month3 2022-05-01 2022-06-25

another data frame df2

start_Month end_Month price 2022-03-27 2260-12-31 1 2022-03-27 2260-12-31 2 2022-03-27 2260-12-31 3

if Month_Start and Month_end of df1 is in between start_Month and end_Month of df2, assign price column value to Month column of df1

like following result

Month price Month1 1 Month2 1 Month3 1

I tried using for loops

for i in range(len(df2)): for j in range(len(df1)): if df2['start_Month'][i] <= df1['Month_Start'][j]<= df1['Month_End'][j] <= df2['end_Month'][i]: new.loc[len(new.index)] = [df1['month'][j], df2['price'][i]]

but taking lot of time for execution for 1000+ rows.

ANY IDEAS?

Bushmaster · Accepted Answer · 2022-10-03 09:41:36Z

Is there a common column where you can combine these two dataframes? such as id. If there is, it would be much more accurate to apply the conditions after combining these two tables. You can try the code below based on current data and conditions (Dataframes that are not the same size may have a problem.).

import pandas as pd import numpy as np df1=pd.DataFrame(data={'Month':['Month1','Month2','Month3'], 'Month_Start':['2022-03-27','2022-05-01','2022-05-01'], 'Month_End':['2022-04-30','2022-05-28','2022-06-25']}) df2=pd.DataFrame(data={'start_Month':['2022-03-27','2022-03-27','2022-03-27'], 'end_Month':['2260-12-31','2260-12-31','2260-12-31'], 'price':[1,2,3]}) con=[(df1['Month_Start']>= df2['start_Month']) & (df1['Month_End']<= df2['end_Month'])] cho=[df2['price']] df1['price']=np.select(con,cho,default=np.nan)#

blackraven · Accepted Answer · 2022-10-04 01:24:44Z

Assuming these are your dataframes:

import pandas as pd df1 = pd.DataFrame({ 'Month': ['Month1', 'Month2', 'Month3'], 'Month_Start': ['2022-03-27', '2022-05-01', '2022-05-01'], 'Month_End': ['2022-04-30', '2022-05-28', '2022-06-25'] }) df1['Month_Start'] = pd.to_datetime(df1['Month_Start']) df1['Month_End'] = pd.to_datetime(df1['Month_End']) df2 = pd.DataFrame({ 'start_Month': ['2022-03-01', '2022-05-01', '2022-06-01'], 'end_Month': ['2022-04-30', '2022-05-30', '2022-06-30'], 'price': [1, 2, 3] }) df2['start_Month'] = pd.to_datetime(df2['start_Month']) df2['end_Month'] = pd.to_datetime(df2['end_Month']) print(df1) Month Month_Start Month_End 0 Month1 2022-03-27 2022-04-30 1 Month2 2022-05-01 2022-05-28 2 Month3 2022-05-01 2022-06-25 print(df2) #note validity periods do not overlap, so only 1 price is valid! start_Month end_Month price 0 2022-03-01 2022-04-30 1 1 2022-05-01 2022-05-30 2 2 2022-06-01 2022-06-30 3

I would define an external function to check the validity period, then return the corresponding price. Note that if more than 1 corresponding validity periods are found, the first one will be returned. If no corresponding period is found, a null value is returned.

def check_validity(row): try: return int(df2['price'][(df2['start_Month']<=row['Month_Start']) & (row['Month_End']<=df2['end_Month'])].values[0]) except: return df1['price'] = df1.apply(lambda x: check_validity(x), axis=1) print(df1)

Output:

 Month Month_Start Month_End price 0 Month1 2022-03-27 2022-04-30 1.0 1 Month2 2022-05-01 2022-05-28 2.0 2 Month3 2022-05-01 2022-06-25 NaN

Collectives™ on Stack Overflow

compare a two date columns of a data frame with another two data frames of second data frame in python

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related