Pandas Upsampling Time Series Splitting Equally the values through the weeks starting on monday

Question

I build my code studying this question: "Divide total sum equally to higher sampled time periods when upsampling with pandas".

I am wondering if can be improved the code and if it is right. It seems working properly, but always looking for room of improvement or if there is any flipside I am not aware in the process I implemented.

Context:

I have monthly data from Influencer Activities. This data are Reach and Engagement.

Goal:

My Goal is to split equally this information through the weeks of the month. Weeks starting on Monday

Major Concerns:

A month is on average composed by 4.34 weeks.

Am I missing some information using the following way when a week overlap between a month and another?

#Creating a min reproducible example import pandas as pd import numpy as np date_index = pd.date_range(start = '01/01/2020', end = '01/12/2022', freq = "MS", inclusive = "left") np.random.seed(0) reach = np.random.randint(1000,10000, len(date_index)) engagement = np.random.randint(100,1000, len(date_index)) reach_engagement = {"reach":reach, "engagement":engagement} df = pd.DataFrame( data = reach_engagement, index = date_index) #Resempling (the goal of the question) df_resample = df.resample('W-MON').fillna("pad") len(df_resample) df_resample = df_resample/7 df_resample.round()

Output:

2020-01-06 533.0 136.0 2020-01-13 533.0 136.0 2020-01-20 533.0 136.0 2020-01-27 533.0 136.0 2020-02-03 609.0 28.0 ... 2021-12-20 1228.0 129.0 2021-12-27 1228.0 129.0 2022-01-03 499.0 33.0 105 rows × 2 columns

I understand that these data are artificial. Are you sure that MS (month-start) is accurate, i.e. the other columns correspond to months starting on the first and not months of consistent length? If so, your current method introduces error. — Reinderien
– Reinderien, Commented Jan 21, 2023 at 17:27
@Reinderien I think so, because on this open colab that I used to craft the question and test it works properly. Please let me know if you find any discrepancy between the question and the code, I will fix it :) — Andrea Ciufo
– Andrea Ciufo, Commented Jan 23, 2023 at 21:21

Reinderien · Accepted Answer · 2023-01-22 04:22:14Z

Your direct use of df.resample('W-MON') neglects the fact that your reach and engagement are applied to months, which presumably all start on the first of the month and have variable length. W-MON is not variable-length. Instead you need to do something like:

Upsample to days
Divide by the number of days in the given month
Only then, downsample to weeks

This could look like:

from datetime import date import numpy as np import pandas as pd from numpy.random import default_rng months = pd.date_range( start=date(2020, 1, 1), freq='MS', name='month_start', end=date(2023, 1, 1), ) n = len(months) - 1 rand = default_rng(seed=0) df = pd.DataFrame( data={'reach': rand.integers(1_000, 10_000, n), 'engagement': rand.integers(100, 1_000, n)}, index=months[:-1]) df.loc[months[-1], :] = [np.nan, np.nan] # for resampling by_day = df.resample('D').ffill().iloc[:-1, :] month_days = by_day.reach.groupby(pd.Grouper(freq='M')).transform('count') by_day.reach /= month_days by_day.engagement /= month_days by_week = by_day.resample('W-MON').sum()

Yes Reach and Engagement are attribute to each month by the data source. I got your point and I 100% agree. — Andrea Ciufo
– Andrea Ciufo, Commented Jan 23, 2023 at 21:24

Stack Exchange Network

Pandas Upsampling Time Series Splitting Equally the values through the weeks starting on monday

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Pandas Upsampling Time Series Splitting Equally the values through the weeks starting on monday

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions