I build my code studying this question: "Divide total sum equally to higher sampled time periods when upsampling with pandas".
I am wondering if can be improved the code and if it is right. It seems working properly, but always looking for room of improvement or if there is any flipside I am not aware in the process I implemented.
Context:
I have monthly data from Influencer Activities. This data are Reach and Engagement.
Goal:
My Goal is to split equally this information through the weeks of the month. Weeks starting on Monday
Major Concerns:
A month is on average composed by 4.34 weeks.
Am I missing some information using the following way when a week overlap between a month and another?
#Creating a min reproducible example import pandas as pd import numpy as np date_index = pd.date_range(start = '01/01/2020', end = '01/12/2022', freq = "MS", inclusive = "left") np.random.seed(0) reach = np.random.randint(1000,10000, len(date_index)) engagement = np.random.randint(100,1000, len(date_index)) reach_engagement = {"reach":reach, "engagement":engagement} df = pd.DataFrame( data = reach_engagement, index = date_index) #Resempling (the goal of the question) df_resample = df.resample('W-MON').fillna("pad") len(df_resample) df_resample = df_resample/7 df_resample.round() Output:
2020-01-06 533.0 136.0 2020-01-13 533.0 136.0 2020-01-20 533.0 136.0 2020-01-27 533.0 136.0 2020-02-03 609.0 28.0 ... 2021-12-20 1228.0 129.0 2021-12-27 1228.0 129.0 2022-01-03 499.0 33.0 105 rows × 2 columns 

MS(month-start) is accurate, i.e. the other columns correspond to months starting on the first and not months of consistent length? If so, your current method introduces error. \$\endgroup\$