4

Here's a piece of code, I don't get why on the last column rm-5, I get NaN for the first 4 items.

I understand that for the rm columns the 1st 4 items aren't filled because there is no data available, but if I shift the column calculation should be made, shouldn't it ?

Similarly I don't get why there are 5 and not 4 items in the rm-5 column that are NaN

import pandas as pd import numpy as np index = pd.date_range('2000-1-1', periods=100, freq='D') df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A']) df['rm']=pd.rolling_mean(df['A'],5) df['rm-5']=pd.rolling_mean(df['A'].shift(-5),5) print df.head(n=8) print df.tail(n=8) A rm rm-5 2000-01-01 0.109161 NaN NaN 2000-01-02 -0.360286 NaN NaN 2000-01-03 -0.092439 NaN NaN 2000-01-04 0.169439 NaN NaN 2000-01-05 0.185829 0.002341 0.091736 2000-01-06 0.432599 0.067028 0.295949 2000-01-07 -0.374317 0.064222 0.055903 2000-01-08 1.258054 0.334321 -0.132972 A rm rm-5 2000-04-02 0.499860 -0.422931 -0.140111 2000-04-03 -0.868718 -0.458962 -0.182373 2000-04-04 0.081059 -0.443494 -0.040646 2000-04-05 0.500275 -0.093048 NaN 2000-04-06 -0.253915 -0.008288 NaN 2000-04-07 -0.159256 -0.140111 NaN 2000-04-08 -1.080027 -0.182373 NaN 2000-04-09 0.789690 -0.040646 NaN 

1 Answer 1

4

You can change the order of operations. Now you are first shifting and afterwards taking the mean. Due to your first shift you create your NaN's at the end.

index = pd.date_range('2000-1-1', periods=100, freq='D') df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A']) df['rm']=pd.rolling_mean(df['A'],5) df['shift'] = df['A'].shift(-5) df['rm-5-shift_first']=pd.rolling_mean(df['A'].shift(-5),5) df['rm-5-mean_first']=pd.rolling_mean(df['A'],5).shift(-5) print( df.head(n=8)) print( df.tail(n=8)) A rm shift rm-5-shift_first rm-5-mean_first 2000-01-01 -0.120808 NaN 0.830231 NaN 0.184197 2000-01-02 0.029547 NaN 0.047451 NaN 0.187778 2000-01-03 0.002652 NaN 1.040963 NaN 0.395440 2000-01-04 -1.078656 NaN -1.118723 NaN 0.387426 2000-01-05 1.137210 -0.006011 0.469557 0.253896 0.253896 2000-01-06 0.830231 0.184197 -0.390506 0.009748 0.009748 2000-01-07 0.047451 0.187778 -1.624492 -0.324640 -0.324640 2000-01-08 1.040963 0.395440 -1.259306 -0.784694 -0.784694 A rm shift rm-5-shift_first rm-5-mean_first 2000-04-02 -1.283123 -0.270381 0.226257 0.760370 0.760370 2000-04-03 1.369342 0.288072 2.367048 0.959912 0.959912 2000-04-04 0.003363 0.299997 1.143513 1.187941 1.187941 2000-04-05 0.694026 0.400442 NaN NaN NaN 2000-04-06 1.508863 0.458494 NaN NaN NaN 2000-04-07 0.226257 0.760370 NaN NaN NaN 2000-04-08 2.367048 0.959912 NaN NaN NaN 2000-04-09 1.143513 1.187941 NaN NaN NaN 

For more see:

http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.shift.html

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.