3

I have two dataframes df1 and df2 df1 contains month and two date columns

df1

Month Month_Start Month_End Month1 2022-03-27 2022-04-30 Month2 2022-05-01 2022-05-28 Month3 2022-05-01 2022-06-25 

another data frame df2

start_Month end_Month price 2022-03-27 2260-12-31 1 2022-03-27 2260-12-31 2 2022-03-27 2260-12-31 3 

if Month_Start and Month_end of df1 is in between start_Month and end_Month of df2, assign price column value to Month column of df1

like following result

Month price Month1 1 Month2 1 Month3 1 

I tried using for loops

for i in range(len(df2)): for j in range(len(df1)): if df2['start_Month'][i] <= df1['Month_Start'][j]<= df1['Month_End'][j] <= df2['end_Month'][i]: new.loc[len(new.index)] = [df1['month'][j], df2['price'][i]] 

but taking lot of time for execution for 1000+ rows.

ANY IDEAS?

2 Answers 2

0

Is there a common column where you can combine these two dataframes? such as id. If there is, it would be much more accurate to apply the conditions after combining these two tables. You can try the code below based on current data and conditions (Dataframes that are not the same size may have a problem.).

import pandas as pd import numpy as np df1=pd.DataFrame(data={'Month':['Month1','Month2','Month3'], 'Month_Start':['2022-03-27','2022-05-01','2022-05-01'], 'Month_End':['2022-04-30','2022-05-28','2022-06-25']}) df2=pd.DataFrame(data={'start_Month':['2022-03-27','2022-03-27','2022-03-27'], 'end_Month':['2260-12-31','2260-12-31','2260-12-31'], 'price':[1,2,3]}) con=[(df1['Month_Start']>= df2['start_Month']) & (df1['Month_End']<= df2['end_Month'])] cho=[df2['price']] df1['price']=np.select(con,cho,default=np.nan)# 
Sign up to request clarification or add additional context in comments.

Comments

0

Assuming these are your dataframes:

import pandas as pd df1 = pd.DataFrame({ 'Month': ['Month1', 'Month2', 'Month3'], 'Month_Start': ['2022-03-27', '2022-05-01', '2022-05-01'], 'Month_End': ['2022-04-30', '2022-05-28', '2022-06-25'] }) df1['Month_Start'] = pd.to_datetime(df1['Month_Start']) df1['Month_End'] = pd.to_datetime(df1['Month_End']) df2 = pd.DataFrame({ 'start_Month': ['2022-03-01', '2022-05-01', '2022-06-01'], 'end_Month': ['2022-04-30', '2022-05-30', '2022-06-30'], 'price': [1, 2, 3] }) df2['start_Month'] = pd.to_datetime(df2['start_Month']) df2['end_Month'] = pd.to_datetime(df2['end_Month']) print(df1) Month Month_Start Month_End 0 Month1 2022-03-27 2022-04-30 1 Month2 2022-05-01 2022-05-28 2 Month3 2022-05-01 2022-06-25 print(df2) #note validity periods do not overlap, so only 1 price is valid! start_Month end_Month price 0 2022-03-01 2022-04-30 1 1 2022-05-01 2022-05-30 2 2 2022-06-01 2022-06-30 3 

I would define an external function to check the validity period, then return the corresponding price. Note that if more than 1 corresponding validity periods are found, the first one will be returned. If no corresponding period is found, a null value is returned.

def check_validity(row): try: return int(df2['price'][(df2['start_Month']<=row['Month_Start']) & (row['Month_End']<=df2['end_Month'])].values[0]) except: return df1['price'] = df1.apply(lambda x: check_validity(x), axis=1) print(df1) 

Output:

 Month Month_Start Month_End price 0 Month1 2022-03-27 2022-04-30 1.0 1 Month2 2022-05-01 2022-05-28 2.0 2 Month3 2022-05-01 2022-06-25 NaN 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.