Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/

edited May 23, 2017 at 12:16

URL Rewriter Bot

Using the strides views concept on dataframestrides views concept on dataframe, here's a vectorized approach -

get_sliding_window(df, 2).dot(X) # window size = 2

Runtime test -

In [101]: df = pd.DataFrame(np.random.rand(5, 2).round(2), columns=['A', 'B']) In [102]: X = np.array([2, 3]) In [103]: rolled_df = roll(df, 2) In [104]: %timeit rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) 100 loops, best of 3: 5.51 ms per loop In [105]: %timeit get_sliding_window(df, 2).dot(X) 10000 loops, best of 3: 43.7 µs per loop

Verify results -

In [106]: rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) Out[106]: 0 1 1 2.70 4.09 2 4.09 2.52 3 2.52 1.78 4 1.78 3.50 In [107]: get_sliding_window(df, 2).dot(X) Out[107]: array([[ 2.7 , 4.09], [ 4.09, 2.52], [ 2.52, 1.78], [ 1.78, 3.5 ]])

Huge improvement there, which I am hoping would stay noticeable on larger arrays!

Using the strides views concept on dataframe, here's a vectorized approach -

get_sliding_window(df, 2).dot(X) # window size = 2

Runtime test -

In [101]: df = pd.DataFrame(np.random.rand(5, 2).round(2), columns=['A', 'B']) In [102]: X = np.array([2, 3]) In [103]: rolled_df = roll(df, 2) In [104]: %timeit rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) 100 loops, best of 3: 5.51 ms per loop In [105]: %timeit get_sliding_window(df, 2).dot(X) 10000 loops, best of 3: 43.7 µs per loop

Verify results -

In [106]: rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) Out[106]: 0 1 1 2.70 4.09 2 4.09 2.52 3 2.52 1.78 4 1.78 3.50 In [107]: get_sliding_window(df, 2).dot(X) Out[107]: array([[ 2.7 , 4.09], [ 4.09, 2.52], [ 2.52, 1.78], [ 1.78, 3.5 ]])

Huge improvement there, which I am hoping would stay noticeable on larger arrays!

Using the strides views concept on dataframe, here's a vectorized approach -

get_sliding_window(df, 2).dot(X) # window size = 2

Runtime test -

In [101]: df = pd.DataFrame(np.random.rand(5, 2).round(2), columns=['A', 'B']) In [102]: X = np.array([2, 3]) In [103]: rolled_df = roll(df, 2) In [104]: %timeit rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) 100 loops, best of 3: 5.51 ms per loop In [105]: %timeit get_sliding_window(df, 2).dot(X) 10000 loops, best of 3: 43.7 µs per loop

Verify results -

In [106]: rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) Out[106]: 0 1 1 2.70 4.09 2 4.09 2.52 3 2.52 1.78 4 1.78 3.50 In [107]: get_sliding_window(df, 2).dot(X) Out[107]: array([[ 2.7 , 4.09], [ 4.09, 2.52], [ 2.52, 1.78], [ 1.78, 3.5 ]])

Huge improvement there, which I am hoping would stay noticeable on larger arrays!

Source Link

answered Dec 31, 2016 at 9:09

Divakar

222.1k
19
273
374

Using the strides views concept on dataframe, here's a vectorized approach -

get_sliding_window(df, 2).dot(X) # window size = 2

Runtime test -

In [101]: df = pd.DataFrame(np.random.rand(5, 2).round(2), columns=['A', 'B']) In [102]: X = np.array([2, 3]) In [103]: rolled_df = roll(df, 2) In [104]: %timeit rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) 100 loops, best of 3: 5.51 ms per loop In [105]: %timeit get_sliding_window(df, 2).dot(X) 10000 loops, best of 3: 43.7 µs per loop

Verify results -

In [106]: rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) Out[106]: 0 1 1 2.70 4.09 2 4.09 2.52 3 2.52 1.78 4 1.78 3.50 In [107]: get_sliding_window(df, 2).dot(X) Out[107]: array([[ 2.7 , 4.09], [ 4.09, 2.52], [ 2.52, 1.78], [ 1.78, 3.5 ]])

Huge improvement there, which I am hoping would stay noticeable on larger arrays!

Collectives™ on Stack Overflow

Return to Answer