Skip to content

Add "semester" as a time/date component to DatetimeIndex #22362

@Nemecsek

Description

@Nemecsek

Groupby is missing "semester"

from datetime import * import pandas as pd import numpy as np df = pd.DataFrame() days = pd.date_range(start="2017-05-17", end="2017-11-29", freq="1D") df = pd.DataFrame({'DTIME': days, 'DATA': np.random.randint(50, high=80, size=len(days))}) df.set_index('DTIME', inplace=True) grouped = df.groupby(pd.Grouper(freq='2QS')) # group by 2 Quarters, start print("Groups date start:") for dtime, group in grouped: print dtime # print(group) 

returns groups based on the first date time index of the dataset, not on the year semesters that begin on January, 1st and July, 1st:

Groups date start: 2017-04-01 00:00:00 <=== this is because the first datetime index is in May, 2017 2017-10-01 00:00:00 

while I would expect:

Groups date start: 2017-01-01 00:00:00 2017-07-01 00:00:00 

This issue is difficult to spot, as the behaviour changes according to the dataset, while it should be consistent. I didn't spot it with my first dataset (starting on January).

The same problem will show when grouping by 6MS (six months, start)

Semester frequency is missing from Pandas'offset-aliases

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions