1
\$\begingroup\$

I build my code studying this question: "Divide total sum equally to higher sampled time periods when upsampling with pandas".

I am wondering if can be improved the code and if it is right. It seems working properly, but always looking for room of improvement or if there is any flipside I am not aware in the process I implemented.

Context:

I have monthly data from Influencer Activities. This data are Reach and Engagement.

Goal:

My Goal is to split equally this information through the weeks of the month. Weeks starting on Monday

Major Concerns:

A month is on average composed by 4.34 weeks.

Am I missing some information using the following way when a week overlap between a month and another?

#Creating a min reproducible example import pandas as pd import numpy as np date_index = pd.date_range(start = '01/01/2020', end = '01/12/2022', freq = "MS", inclusive = "left") np.random.seed(0) reach = np.random.randint(1000,10000, len(date_index)) engagement = np.random.randint(100,1000, len(date_index)) reach_engagement = {"reach":reach, "engagement":engagement} df = pd.DataFrame( data = reach_engagement, index = date_index) #Resempling (the goal of the question) df_resample = df.resample('W-MON').fillna("pad") len(df_resample) df_resample = df_resample/7 df_resample.round() 

Output:

2020-01-06 533.0 136.0 2020-01-13 533.0 136.0 2020-01-20 533.0 136.0 2020-01-27 533.0 136.0 2020-02-03 609.0 28.0 ... 2021-12-20 1228.0 129.0 2021-12-27 1228.0 129.0 2022-01-03 499.0 33.0 105 rows × 2 columns 
\$\endgroup\$
2
  • 1
    \$\begingroup\$ I understand that these data are artificial. Are you sure that MS (month-start) is accurate, i.e. the other columns correspond to months starting on the first and not months of consistent length? If so, your current method introduces error. \$\endgroup\$ Commented Jan 21, 2023 at 17:27
  • \$\begingroup\$ @Reinderien I think so, because on this open colab that I used to craft the question and test it works properly. Please let me know if you find any discrepancy between the question and the code, I will fix it :) \$\endgroup\$ Commented Jan 23, 2023 at 21:21

1 Answer 1

1
\$\begingroup\$

Your direct use of df.resample('W-MON') neglects the fact that your reach and engagement are applied to months, which presumably all start on the first of the month and have variable length. W-MON is not variable-length. Instead you need to do something like:

  • Upsample to days
  • Divide by the number of days in the given month
  • Only then, downsample to weeks

This could look like:

from datetime import date import numpy as np import pandas as pd from numpy.random import default_rng months = pd.date_range( start=date(2020, 1, 1), freq='MS', name='month_start', end=date(2023, 1, 1), ) n = len(months) - 1 rand = default_rng(seed=0) df = pd.DataFrame( data={'reach': rand.integers(1_000, 10_000, n), 'engagement': rand.integers(100, 1_000, n)}, index=months[:-1]) df.loc[months[-1], :] = [np.nan, np.nan] # for resampling by_day = df.resample('D').ffill().iloc[:-1, :] month_days = by_day.reach.groupby(pd.Grouper(freq='M')).transform('count') by_day.reach /= month_days by_day.engagement /= month_days by_week = by_day.resample('W-MON').sum() 

generated dataframe

week-adjusted

\$\endgroup\$
1
  • 1
    \$\begingroup\$ Yes Reach and Engagement are attribute to each month by the data source. I got your point and I 100% agree. \$\endgroup\$ Commented Jan 23, 2023 at 21:24

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.