11

I am trying to calculate Volume Weighted Average Price on a rolling basis.

To do this, I have a function vwap that does this for me, like so:

def vwap(bars): return ((bars.Close*bars.Volume).sum()/bars.Volume.sum()).round(2) 

When I try to use this function with rolling_apply, as shown, I get an error:

import pandas.io.data as web bars = web.DataReader('AAPL','yahoo') print pandas.rolling_apply(bars,30,vwap) AttributeError: 'numpy.ndarray' object has no attribute 'Close' 

The error makes sense to me because the rolling_apply requires not DataSeries or a ndarray as an input and not a dataFrame.. the way I am doing it.

Is there a way to use rolling_apply to a DataFrame to solve my problem?

1

3 Answers 3

10

This is not directly enabled, but you can do it like this

In [29]: bars Out[29]: <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 942 entries, 2010-01-04 00:00:00 to 2013-09-30 00:00:00 Data columns (total 6 columns): Open 942 non-null values High 942 non-null values Low 942 non-null values Close 942 non-null values Volume 942 non-null values Adj Close 942 non-null values dtypes: float64(5), int64(1) window=30 In [30]: concat([ (Series(vwap(bars.iloc[i:i+window]), index=[bars.index[i+window]])) for i in xrange(len(df)-window) ]) Out[30]: 2010-02-17 203.21 2010-02-18 202.95 2010-02-19 202.64 2010-02-22 202.41 2010-02-23 202.19 2010-02-24 201.85 2010-02-25 201.65 2010-02-26 201.50 2010-03-01 201.31 2010-03-02 201.35 2010-03-03 201.42 2010-03-04 201.09 2010-03-05 200.95 2010-03-08 201.50 2010-03-09 202.02 ... 2013-09-10 485.94 2013-09-11 487.38 2013-09-12 486.77 2013-09-13 487.23 2013-09-16 487.20 2013-09-17 486.09 2013-09-18 485.52 2013-09-19 485.30 2013-09-20 485.37 2013-09-23 484.87 2013-09-24 485.81 2013-09-25 486.41 2013-09-26 486.07 2013-09-27 485.30 2013-09-30 484.74 Length: 912 
Sign up to request clarification or add additional context in comments.

1 Comment

Nice solution, was helpful to me! Question though: In your list comprehension, wouldn't you use bars.iloc[i:i+window+1] since .iloc excludes upper bound? With your code, only 29 values are used in the calculation ending at bars.iloc[i+window-1], while bars.index[i+window] is used as the label. In this sort of calculation, I would think you would want bars.iloc[i+window] included in the calculation.
4

A cleaned up version for reference, hopefully got the indexing correct:

def myrolling_apply(df, N, f, nn=1): ii = [int(x) for x in arange(0, df.shape[0] - N + 1, nn)] out = [f(df.iloc[i:(i + N)]) for i in ii] out = pandas.Series(out) out.index = df.index[N-1::nn] return(out) 

Comments

1

Modified @mathtick's answer to include na_fill. Also note that your function f needs to return a single value, this can't return a dataframe with multiple columns.

def rolling_apply_df(dfg, N, f, nn=1, na_fill=True): ii = [int(x) for x in np.arange(0, dfg.shape[0] - N + 1, nn)] out = [f(dfg.iloc[i:(i + N)]) for i in ii] if(na_fill): out = pd.Series(np.concatenate([np.repeat(np.nan, N-1),np.array(out)])) out.index = dfg.index[::nn] else: out = pd.Series(out) out.index = dfg.index[N-1::nn] return(out) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.