Group the datetime based on another array of the datetime in pandas - Python

Question

Assume I have two pandas dataframe with two columns of datetime as follow:

Col_1	Col_2
2023-01-02	2023-01-02 9:00
2023-01-03	2023-01-02 20:00
2023-01-04	2023-01-03 1:00
2023-01-05	2023-01-04 9:00
2023-01-08	2023-01-05 16:00
	2023-01-06 9:00
	2023-01-08 12:00

And I want to fit Col_2 datetime base on Col_1 datetime with following condition:

If the datetime aheads 12pm, it stays with same date if the Col_1 has the same date, if not, it will switch the next Col_1 datetime. If the time pass or equal to 12pm, it will switch to the next date based on Col_1 datetime.

However, if the last value is pass 12pm and there is no relevant date from Col_1, just add one day base on the last day in Col_1

For example, I expect Col_2 will look like below:

Col_1	Col_2
2023-01-02	2023-01-02
2023-01-03	2023-01-03
2023-01-04	2023-01-03
2023-01-05	2023-01-04
2023-01-08	2023-01-08
	2023-01-08
	2023-01-09

Is there any function to do so in pandas? Or how can I write the code?

are these two columns from two different dataframes?

Derek O
– Derek O

2023-02-24 18:01:05 +00:00
Commented Feb 24, 2023 at 18:01 — Derek O
– Derek O, Commented Feb 24, 2023 at 18:01
Yes, they are from differen DataFrames.

Chi-Yuan Li
– Chi-Yuan Li

2023-02-24 18:20:36 +00:00
Commented Feb 24, 2023 at 18:20 — Chi-Yuan Li
– Chi-Yuan Li, Commented Feb 24, 2023 at 18:20

Derek O · Accepted Answer · 2023-02-25 08:10:12Z

You can write a custom function that looks up the date in df1, and then apply it to df2['Col_2']:

import pandas as pd df1 = pd.DataFrame({ 'Col_1': pd.to_datetime([ '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-08' ]) }) df2 = pd.DataFrame({ 'Col_2': pd.to_datetime([ '2023-01-02 9:00', '2023-01-02 20:00', '2023-01-03 1:00', '2023-01-04 9:00', '2023-01-05 16:00', '2023-01-06 9:00', '2023-01-08 12:00']) }) ## create a function that takes a datetime as an input, ## and returns the correct datetime from df1 'Col_1' as a reference def get_date_from_df1(ts, df1): ts_in_df1 = sum(df1['Col_1'] == ts.floor(freq='d')) ## get index of the the ts: if ts_in_df1 == 1: if ts.hour < 12: idx = df1.loc[df1['Col_1'] == ts.floor(freq='d')].index[0] elif (ts.hour == 12) & (ts.minute == 0): idx = df1.loc[df1['Col_1'] == ts.floor(freq='d')].index[0] if idx + 1 > df1.index[-1]: return ts.ceil(freq='d') else: idx = df1.loc[df1['Col_1'] == ts.floor(freq='d')].index[0] + 1 elif sum(df1['Col_1'] > ts.floor(freq='d')) == 1: idx = df1.loc[df1['Col_1'] > ts.floor(freq='d')].index[0] ## if the date isn't found in df1, we return None (to avoid an error) else: return None return df1['Col_1'].iloc[idx] df2['Col_2'] = df2['Col_2'].apply(lambda ts: get_date_from_df1(ts, df1))

Result:

>>> df2 Col_2 0 2023-01-02 1 2023-01-03 2 2023-01-03 3 2023-01-04 4 2023-01-08 5 2023-01-08 6 2023-01-09

Collectives™ on Stack Overflow

Group the datetime based on another array of the datetime in pandas - Python

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related