Pandas dataframe groupby datetime month

Question

Consider a CSV file:

string,date,number a string,2/5/11 9:16am,1.0 a string,3/5/11 10:44pm,2.0 a string,4/22/11 12:07pm,3.0 a string,4/22/11 12:10pm,4.0 a string,4/29/11 11:59am,1.0 a string,5/2/11 1:41pm,2.0 a string,5/2/11 2:02pm,3.0 a string,5/2/11 2:56pm,4.0 a string,5/2/11 3:00pm,5.0 a string,5/2/14 3:02pm,6.0 a string,5/2/14 3:18pm,7.0

I can read this in, and reformat the date column into datetime format:

b = pd.read_csv('b.dat') b['date'] = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')

I have been trying to group the data by month. It seems like there should be an obvious way of accessing the month and grouping by that. But I can't seem to do it. Does anyone know how?

What I am currently trying is re-indexing by the date:

b.index = b['date']

I can access the month like so:

b.index.month

However I can't seem to find a function to lump together by month.

If you are struggling with applying any of the answer, please keep in mind that in this question (and therefore in the answers) the Datetime value is assigned to the index of the Dataframe. A quick tip/reminder could be the following: if you have a Datetime column, you can actually access the single Yeay/Month/Day/Hour/Minute value just by doing my_df.my_column.dt.month — Federico Dorato
– Federico Dorato, Commented Nov 30, 2020 at 13:03

goodside · Accepted Answer · 2020-03-05 18:12:07Z

276

Managed to do it:

b = pd.read_csv('b.dat') b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p') b.groupby(by=[b.index.month, b.index.year])

Or

b.groupby(pd.Grouper(freq='M')) # update for v0.21+

edited Mar 5, 2020 at 18:12

goodside

4,6993 gold badges25 silver badges36 bronze badges

answered Jun 6, 2014 at 13:38

Lee

31.4k31 gold badges124 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Karl D. Over a year ago

I think the more pandonic ways are to either use resample (when it provides the functionality you need) or use a TimeGrouper: df.groupby(pd.TimeGrouper(freq='M'))

Alexandre Over a year ago

to get the result DataFrame sum or average, df.groupby(pd.TimeGrouper(freq='M')).sum() or df.groupby(pd.TimeGrouper(freq='M')).mean()

BallpointBen Over a year ago

pd.TimeGrouper has been deprecated in favor of pd.Grouper, which is a bit more flexible but still takes freq and level arguments.

goodside Over a year ago

@ely The answer implicitly relies on the lines in the original question where b is given an index after being read from CSV. Add b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p') after the line b = pd.read_csv('b.dat'). [I've edited the answer just now too.]

MangoNrFive Over a year ago

To group the months in chronological order, you need to swap the month and year index. The resulting command for the grouping being b.groupby(by=[b.index.year, b.index.month]).

|

PandasRocks · Accepted Answer · 2018-01-20 12:38:02Z

126

(update: 2018)

Note that pd.Timegrouper is depreciated and will be removed. Use instead:

 df.groupby(pd.Grouper(freq='M'))

answered Jan 20, 2018 at 12:38

PandasRocks

1,7071 gold badge10 silver badges12 bronze badges

4 Comments

Kim Over a year ago

Find the Grouper docs here and the frequency specifications (freq=...) here. Some examples are freq=D for days, freq=B for business days, freq=W for weeks or even freq=Q for quarters.

Edward Over a year ago

I found it useful to use 'key' to avoid having to reindex the df, as follows: df.groupby(pd.Grouper(key='your_date_column', freq='M'))

exlo Over a year ago

Does this work if you're grouping by two columns, only one of which is datetime value column?

Piotr Over a year ago

speeding up the further research for those who want to group by specific column (or more): df.groupby(['col1', pd.Grouper(key='date_col', freq='1M')]).agg({ 'col2': 'sum', 'col3': 'max' })

Mykola Zotko · Accepted Answer · 2021-07-05 14:17:25Z

33

To groupby time-series data you can use the method resample. For example, to groupby by month:

df.resample(rule='M', on='date')['Values'].sum()

The list with offset aliases you can find here.

edited Jul 5, 2021 at 14:17

answered Jun 8, 2021 at 8:16

Mykola Zotko

18.2k6 gold badges88 silver badges90 bronze badges

1 Comment

Michael Dorner Over a year ago

Thanks! This should be the accepted answer: groupby by time is called resample.

jpp · Accepted Answer · 2020-11-25 17:58:51Z

One solution which avoids MultiIndex is to create a new datetime column setting day = 1. Then group by this column.

Normalise day of month

df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20', '2017-10-01', '2017-09-01']), 'Values': [5, 10, 15, 20]}) # normalize day to beginning of month, 4 alternative methods below df['YearMonth'] = df['Date'] + pd.offsets.MonthEnd(-1) + pd.offsets.Day(1) df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D') df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1)) df['YearMonth'] = df['Date'].dt.normalize().map(pd.tseries.offsets.MonthBegin().rollback)

Then use groupby as normal:

g = df.groupby('YearMonth') res = g['Values'].sum() # YearMonth # 2017-09-01 20 # 2017-10-01 30 # Name: Values, dtype: int64

Comparison with `pd.Grouper`

The subtle benefit of this solution is, unlike pd.Grouper, the grouper index is normalized to the beginning of each month rather than the end, and therefore you can easily extract groups via get_group:

some_group = g.get_group('2017-10-01')

Calculating the last day of October is slightly more cumbersome. pd.Grouper, as of v0.23, does support a convention parameter, but this is only applicable for a PeriodIndex grouper.

Comparison with string conversion

An alternative to the above idea is to convert to a string, e.g. convert datetime 2017-10-XX to string '2017-10'. However, this is not recommended since you lose all the efficiency benefits of a datetime series (stored internally as numerical data in a contiguous memory block) versus an object series of strings (stored as an array of pointers).

See this answer for the proper way to utilize offsets when there are already day=1 values : stackoverflow.com/a/45831333/9987623.
@AlexK, does pd.tseries.offsets have an advantage over pd.tseries.MonthBegin ?
sorry, I don't know enough to tell those apart. I just added the comment because your df['YearMonth'] = df['Date'] - pd.offsets.MonthBegin(1) code above changes any date that is already the first of the month to the first of the previous month.
For beginning of month with Grouper, see also stackoverflow.com/a/56280791/10495893

tsando · Accepted Answer · 2018-10-11 10:58:29Z

11

Slightly alternative solution to @jpp's but outputting a YearMonth string:

df['YearMonth'] = pd.to_datetime(df['Date']).apply(lambda x: '{year}-{month}'.format(year=x.year, month=x.month)) res = df.groupby('YearMonth')['Values'].sum()

answered Oct 11, 2018 at 10:58

tsando

4,7373 gold badges38 silver badges39 bronze badges

2 Comments

user2723494 Over a year ago

Helpful in the case where you don't want to set your date column as an index

Raisin Over a year ago

Can also generate strings using dt.strfrtime: df['Date'].dt.strftime('%Y %b.')

Collectives™ on Stack Overflow

Pandas dataframe groupby datetime month

5 Answers 5

6 Comments

4 Comments

1 Comment

Normalise day of month

Comparison with `pd.Grouper`

Comparison with string conversion

5 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

6 Comments

4 Comments

1 Comment

Normalise day of month

Comparison with pd.Grouper

Comparison with string conversion

5 Comments

2 Comments

Linked

Related

Comparison with `pd.Grouper`