The code doesn't run because the current version of Pandas sort_values needs to accept booleans for ascending. (But those can be omitted entirely because True is the default.)
More broadly: the function fails to declare its signature types, mutates df when it shouldn't, is far too complex, includes several instances of repeated expressions that should be de-duplicated, fails to vectorise, and inappropriately returns a result containing a temporary column.
week is baffling. You claim
The code works (obviously)
I really wouldn't be so sure. week is not the week of the month, nor the week of the year, nor (as one may guess from examining the output) the number of weeks from the first week seen per merchant group. The latter would be calculated via something like
merchant_timedelta = result2.index - ( result2.reset_index() .groupby('merchant_name')['date'] .transform('first') .set_axis(result2.index) ) merchant_weeks = merchant_timedelta // pd.Timedelta(1, unit='W')
so the original implementation is either simply wrong, or poorly-described and poorly-calculated. I ignore it, which reduces a refactored version to
def date_manipulate_refactored(df: pd.DataFrame, start_day: int) -> pd.DataFrame: result = df.sort_values(by=['merchant_name', 'date']) return result.assign( Month=result.index.strftime('%B'), DOW=(result.index.dayofweek + 1 - start_day) % 7, )
This by the way fixes the eight-some warnings that the original code throws:
189787.py:15: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df.replace({"DOW": week_offset}, inplace=True) 189787.py:21: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` current_merchant = df['merchant_name'][0] ...
This can be tested via something like
import collections import numpy as np import pandas as pd def date_manipulate(df, startday): df['Month'] = df.index.strftime("%B") df['DOW'] = df.index.strftime("%A") week_offset = collections.OrderedDict() default_week = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'] temp_week = default_week[startday:] + default_week[0:startday] for index, day in enumerate(temp_week): week_offset[day] = index df.replace({"DOW": week_offset}, inplace=True) df=df.reset_index().sort_values( ['merchant_name', 'date'], ascending=[True, True], ).set_index(['date']) df['temp_date'] = df.index.map(lambda x: x.toordinal()) current_merchant = df['merchant_name'][0] offset = list(week_offset.values())[df['DOW'][0]] startdate = df['temp_date'][0]-(offset) enddate = startdate+6 week=1 df['week']=0 df.reset_index(inplace=True) for counter, day in enumerate (df['temp_date']): if df['merchant_name'][counter]==current_merchant: if(df['temp_date'][counter])<=enddate: df.loc[counter,'week']=week else: enddate+=7 week+=1 df.loc[counter,'week']=week else: offset=list(week_offset.values())[df['DOW'][counter]] current_merchant=df.loc[counter,'merchant_name'] startdate=df.loc[counter,'temp_date']-(offset) enddate=startdate+6 week=1 df.loc[counter,'week']=week df.set_index('date',inplace=True) return df def date_manipulate_refactored(df: pd.DataFrame, start_day: int) -> pd.DataFrame: result = df.sort_values(by=['merchant_name', 'date']) return result.assign( Month=result.index.strftime('%B'), DOW=(result.index.dayofweek + 1 - start_day) % 7, ) def demo() -> None: rand = np.random.default_rng(seed=0) dates = pd.date_range( name='date', start='2020-01-01T00:00', end='2020-04-01T00:00', freq='12h', ) merchant_points = rand.choice( a=rand.integers( low=ord('a'), high=1 + ord('z'), size=(20, 7), ), size=len(dates), ) df = pd.DataFrame( index=dates, data={ 'merchant_name': [ ''.join(chr(x) for x in line) for line in merchant_points ], }, ) result = date_manipulate(df=df.copy(), startday=5) result2 = date_manipulate(df, startday=5) pd.testing.assert_frame_equal( left=result[['merchant_name', 'Month', 'DOW']], right=result2[['merchant_name', 'Month', 'DOW']], ) if __name__ == '__main__': demo()