-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
Description
Groupby is missing "semester"
from datetime import * import pandas as pd import numpy as np df = pd.DataFrame() days = pd.date_range(start="2017-05-17", end="2017-11-29", freq="1D") df = pd.DataFrame({'DTIME': days, 'DATA': np.random.randint(50, high=80, size=len(days))}) df.set_index('DTIME', inplace=True) grouped = df.groupby(pd.Grouper(freq='2QS')) # group by 2 Quarters, start print("Groups date start:") for dtime, group in grouped: print dtime # print(group) returns groups based on the first date time index of the dataset, not on the year semesters that begin on January, 1st and July, 1st:
Groups date start: 2017-04-01 00:00:00 <=== this is because the first datetime index is in May, 2017 2017-10-01 00:00:00 while I would expect:
Groups date start: 2017-01-01 00:00:00 2017-07-01 00:00:00 This issue is difficult to spot, as the behaviour changes according to the dataset, while it should be consistent. I didn't spot it with my first dataset (starting on January).
The same problem will show when grouping by 6MS (six months, start)
Semester frequency is missing from Pandas'offset-aliases