Skip to content

ERR: too strict validation on groupby.rolling with time-aware freq #15130

@jreback

Description

@jreback

http://stackoverflow.com/questions/41642320/efficient-pandas-rolling-aggregation-over-date-range-by-group-python-2-7-windo/41643179?noredirect=1#comment70486923_41643179

In [1]: data = [ ...: ['David', '1/1/2015', 100], ['David', '1/5/2015', 500], ['David', '5/30/2015', 50], ['David', '7/25/2015', 50], ...: ['Ryan', '1/4/2014', 100], ['Ryan', '1/19/2015', 500], ['Ryan', '3/31/2016', 50], ...: ['Joe', '7/1/2015', 100], ['Joe', '9/9/2015', 500], ['Joe', '10/15/2015', 50] ...: ] ...: ...: list_of_vals = [] ...: ...: dates_df = pd.DataFrame(data=data, columns=['name', 'date', 'amount'], index=None) ...: dates_df['date'] = pd.to_datetime(dates_df['date']) ...: 

This check doesn't need to occur when we are grouping

In [7]: dates_df.groupby('name').rolling('180D', on='date')['amount'].sum() --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-7-8896cb99a66a> in <module>() ----> 1 dates_df.groupby('name').rolling('180D', on='date')['amount'].sum() /Users/jreback/pandas/pandas/core/groupby.py in rolling(self, *args, **kwargs) 1148 """ 1149 from pandas.core.window import RollingGroupby -> 1150 return RollingGroupby(self, *args, **kwargs) 1151 1152 @Substitution(name='groupby') /Users/jreback/pandas/pandas/core/window.py in __init__(self, obj, *args, **kwargs) 635 self._groupby.mutated = True 636 self._groupby.grouper.mutated = True --> 637 super(GroupByMixin, self).__init__(obj, *args, **kwargs) 638 639 count = GroupByMixin._dispatch('count') /Users/jreback/pandas/pandas/core/window.py in __init__(self, obj, window, min_periods, freq, center, win_type, axis, on, **kwargs) 76 self.win_type = win_type 77 self.axis = obj._get_axis_number(axis) if axis is not None else None ---> 78 self.validate() 79 80 @property /Users/jreback/pandas/pandas/core/window.py in validate(self) 1030 formatted = self.on or 'index' 1031 raise ValueError("{0} must be " -> 1032 "monotonic".format(formatted)) 1033 1034 from pandas.tseries.frequencies import to_offset ValueError: date must be monotonic 

This is ok

In [9]: dates_df.groupby('name').apply(lambda x: x.rolling('180D', on='date')['amount'].sum()) Out[9]: name David 0 100.0 1 600.0 2 650.0 3 100.0 Joe 7 100.0 8 600.0 9 650.0 Ryan 4 100.0 5 500.0 6 50.0 Name: amount, dtype: float64 

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugError ReportingIncorrect or improved errors from pandasGroupbyReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions