-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Research
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/a/71803558/277716
Question about pandas
I linked a specific complete answer at stackoverflow which tackles the problem of deriving the equivalent of pandas.core.window.rolling.Rolling.max but the window is an arbitrary column of integers in the same dataframe; however: even if that solution strives to be vectorized: it's extremely slow to the point of becoming unusable for large dataframes compared to the basic case of a constant window size; I suspect it may be impossible to be fast because SIMD hardware may prefer a constant nature of window size.
However: I wonder if the devs of the pandas software itself may have ideas of how to do that since they are the ones who have coded the extremely fast (vectorized) pandas.core.window.rolling.Rolling.max.
It would normally be a feature request for pandas.DataFrame.rolling to accept arbitrary integers from a column in the dataframe as a window but I don't know if it's even performant to do that.
Bug related to later comments below
import pandas as pd from pandas import api import numpy as np class MultiWindowIndexer(api.indexers.BaseIndexer): def __init__(self, window): self.window = np.array(window) super().__init__() def get_window_bounds(self, num_values, min_periods, center, closed): end = np.arange(num_values, dtype='int64') + 1 start = np.clip(end - self.window, 0, num_values) return start, end np.random.seed([3,14]) a = np.random.randn(20).cumsum() w = np.minimum( np.random.randint(1, 4, size=a.shape), np.arange(len(a))+1 ) df = pd.DataFrame({'Data': a, 'Window': w}) df['max1'] = df.Data.rolling(MultiWindowIndexer(df.Window)).max(engine='cython') print(df) Expected outcome: index 18 max1 should be -1.487828 instead of -1.932612
source of code and further discussion on the bug at stackoverflow