0

Assume I have two pandas dataframe with two columns of datetime as follow:

Col_1 Col_2
2023-01-02 2023-01-02 9:00
2023-01-03 2023-01-02 20:00
2023-01-04 2023-01-03 1:00
2023-01-05 2023-01-04 9:00
2023-01-08 2023-01-05 16:00
2023-01-06 9:00
2023-01-08 12:00

And I want to fit Col_2 datetime base on Col_1 datetime with following condition:

If the datetime aheads 12pm, it stays with same date if the Col_1 has the same date, if not, it will switch the next Col_1 datetime. If the time pass or equal to 12pm, it will switch to the next date based on Col_1 datetime.

However, if the last value is pass 12pm and there is no relevant date from Col_1, just add one day base on the last day in Col_1

For example, I expect Col_2 will look like below:

Col_1 Col_2
2023-01-02 2023-01-02
2023-01-03 2023-01-03
2023-01-04 2023-01-03
2023-01-05 2023-01-04
2023-01-08 2023-01-08
2023-01-08
2023-01-09

Is there any function to do so in pandas? Or how can I write the code?

2
  • are these two columns from two different dataframes? Commented Feb 24, 2023 at 18:01
  • Yes, they are from differen DataFrames. Commented Feb 24, 2023 at 18:20

1 Answer 1

1

You can write a custom function that looks up the date in df1, and then apply it to df2['Col_2']:

import pandas as pd df1 = pd.DataFrame({ 'Col_1': pd.to_datetime([ '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-08' ]) }) df2 = pd.DataFrame({ 'Col_2': pd.to_datetime([ '2023-01-02 9:00', '2023-01-02 20:00', '2023-01-03 1:00', '2023-01-04 9:00', '2023-01-05 16:00', '2023-01-06 9:00', '2023-01-08 12:00']) }) ## create a function that takes a datetime as an input, ## and returns the correct datetime from df1 'Col_1' as a reference def get_date_from_df1(ts, df1): ts_in_df1 = sum(df1['Col_1'] == ts.floor(freq='d')) ## get index of the the ts: if ts_in_df1 == 1: if ts.hour < 12: idx = df1.loc[df1['Col_1'] == ts.floor(freq='d')].index[0] elif (ts.hour == 12) & (ts.minute == 0): idx = df1.loc[df1['Col_1'] == ts.floor(freq='d')].index[0] if idx + 1 > df1.index[-1]: return ts.ceil(freq='d') else: idx = df1.loc[df1['Col_1'] == ts.floor(freq='d')].index[0] + 1 elif sum(df1['Col_1'] > ts.floor(freq='d')) == 1: idx = df1.loc[df1['Col_1'] > ts.floor(freq='d')].index[0] ## if the date isn't found in df1, we return None (to avoid an error) else: return None return df1['Col_1'].iloc[idx] df2['Col_2'] = df2['Col_2'].apply(lambda ts: get_date_from_df1(ts, df1)) 

Result:

>>> df2 Col_2 0 2023-01-02 1 2023-01-03 2 2023-01-03 3 2023-01-04 4 2023-01-08 5 2023-01-08 6 2023-01-09 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.