Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link
URL Rewriter Bot
URL Rewriter Bot

Using the strides views concept on dataframestrides views concept on dataframe, here's a vectorized approach -

get_sliding_window(df, 2).dot(X) # window size = 2 

Runtime test -

In [101]: df = pd.DataFrame(np.random.rand(5, 2).round(2), columns=['A', 'B']) In [102]: X = np.array([2, 3]) In [103]: rolled_df = roll(df, 2) In [104]: %timeit rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) 100 loops, best of 3: 5.51 ms per loop In [105]: %timeit get_sliding_window(df, 2).dot(X) 10000 loops, best of 3: 43.7 µs per loop 

Verify results -

In [106]: rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) Out[106]: 0 1 1 2.70 4.09 2 4.09 2.52 3 2.52 1.78 4 1.78 3.50 In [107]: get_sliding_window(df, 2).dot(X) Out[107]: array([[ 2.7 , 4.09], [ 4.09, 2.52], [ 2.52, 1.78], [ 1.78, 3.5 ]]) 

Huge improvement there, which I am hoping would stay noticeable on larger arrays!

Using the strides views concept on dataframe, here's a vectorized approach -

get_sliding_window(df, 2).dot(X) # window size = 2 

Runtime test -

In [101]: df = pd.DataFrame(np.random.rand(5, 2).round(2), columns=['A', 'B']) In [102]: X = np.array([2, 3]) In [103]: rolled_df = roll(df, 2) In [104]: %timeit rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) 100 loops, best of 3: 5.51 ms per loop In [105]: %timeit get_sliding_window(df, 2).dot(X) 10000 loops, best of 3: 43.7 µs per loop 

Verify results -

In [106]: rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) Out[106]: 0 1 1 2.70 4.09 2 4.09 2.52 3 2.52 1.78 4 1.78 3.50 In [107]: get_sliding_window(df, 2).dot(X) Out[107]: array([[ 2.7 , 4.09], [ 4.09, 2.52], [ 2.52, 1.78], [ 1.78, 3.5 ]]) 

Huge improvement there, which I am hoping would stay noticeable on larger arrays!

Using the strides views concept on dataframe, here's a vectorized approach -

get_sliding_window(df, 2).dot(X) # window size = 2 

Runtime test -

In [101]: df = pd.DataFrame(np.random.rand(5, 2).round(2), columns=['A', 'B']) In [102]: X = np.array([2, 3]) In [103]: rolled_df = roll(df, 2) In [104]: %timeit rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) 100 loops, best of 3: 5.51 ms per loop In [105]: %timeit get_sliding_window(df, 2).dot(X) 10000 loops, best of 3: 43.7 µs per loop 

Verify results -

In [106]: rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) Out[106]: 0 1 1 2.70 4.09 2 4.09 2.52 3 2.52 1.78 4 1.78 3.50 In [107]: get_sliding_window(df, 2).dot(X) Out[107]: array([[ 2.7 , 4.09], [ 4.09, 2.52], [ 2.52, 1.78], [ 1.78, 3.5 ]]) 

Huge improvement there, which I am hoping would stay noticeable on larger arrays!

Source Link
Divakar
  • 222.1k
  • 19
  • 273
  • 374

Using the strides views concept on dataframe, here's a vectorized approach -

get_sliding_window(df, 2).dot(X) # window size = 2 

Runtime test -

In [101]: df = pd.DataFrame(np.random.rand(5, 2).round(2), columns=['A', 'B']) In [102]: X = np.array([2, 3]) In [103]: rolled_df = roll(df, 2) In [104]: %timeit rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) 100 loops, best of 3: 5.51 ms per loop In [105]: %timeit get_sliding_window(df, 2).dot(X) 10000 loops, best of 3: 43.7 µs per loop 

Verify results -

In [106]: rolled_df.apply(lambda df: pd.Series(df.values.dot(X))) Out[106]: 0 1 1 2.70 4.09 2 4.09 2.52 3 2.52 1.78 4 1.78 3.50 In [107]: get_sliding_window(df, 2).dot(X) Out[107]: array([[ 2.7 , 4.09], [ 4.09, 2.52], [ 2.52, 1.78], [ 1.78, 3.5 ]]) 

Huge improvement there, which I am hoping would stay noticeable on larger arrays!