1

I am new to python and have the following problem:

I have daily measurements that I imported into a pd.series object. The problem is that the data is such that there are always 31 measurements per month, irrespective of whether it is January or February. In case the respective month has less than 31 days, the measurements are set to zero for those days that exceed the last day of that month. However, missing data within a month is also set to zero. The data for the months of April and May looks like this.

1990-04-01 25.870 1990-04-01 26.205 1990-04-01 12.283 1990-04-01 19.630 1990-04-01 19.239 1990-04-01 23.614 1990-04-01 40.891 1990-04-01 41.152 1990-04-01 35.935 1990-04-01 25.682 1990-04-01 21.674 1990-04-01 15.818 1990-04-01 11.413 1990-04-01 16.522 1990-04-01 33.543 1990-04-01 28.727 1990-04-01 18.043 1990-04-01 10.326 1990-04-01 19.159 1990-04-01 21.848 1990-04-01 35.250 1990-04-01 39.152 1990-04-01 31.522 1990-04-01 23.152 1990-04-01 13.250 1990-04-01 20.705 1990-04-01 27.304 1990-04-01 24.478 1990-04-01 33.674 1990-04-01 32.591 1990-04-01 0.000 1990-05-01 40.370 1990-05-01 41.609 1990-05-01 47.478 1990-05-01 40.682 1990-05-01 42.587 1990-05-01 38.826 1990-05-01 35.543 1990-05-01 30.955 1990-05-01 23.543 1990-05-01 7.857 1990-05-01 0.000 1990-05-01 0.000 1990-05-01 0.000 1990-05-01 0.000 1990-05-01 0.000 1990-05-01 0.000 1990-05-01 54.133 1990-05-01 41.114 1990-05-01 44.739 1990-05-01 43.848 1990-05-01 26.739 1990-05-01 21.318 1990-05-01 26.750 1990-05-01 54.864 1990-05-01 33.000 1990-05-01 33.304 1990-05-01 34.304 1990-05-01 20.886 1990-05-01 20.250 1990-05-01 24.804 1990-05-01 28.091 Length: 62 

Is there a way to drop those entries that do not belong to the respective months and provide a new time-index for each day? I need to keep the zeros within a month.

1
  • I'm taking a bit of a punt here, because I don't have pandas installed, but what happens if you try s[['1990-04-01', '1990-04-02']] ? (Assuming your series is called s, of course.) If it gives you something sensible then the problem is now a much more tractable one of generating a list of all the valid indices. Commented Nov 22, 2013 at 17:19

1 Answer 1

1

First I would replace the 0s with NaN (to symbolise missing data):

s.replace(0, np.nan, inplace=True) 

One way is to create a function which acts on each group (month):

def f(s_month): date = s_month.index[0] # create the month long range rng = pd.date_range(date, date + pd.offsets.MonthEnd(1), freq='D') # throw away results longer than month length s_month = s_month.iloc[0:len(rng)] # reindex those remaining s_month.index = rng return s_month 

Note: this requires that you have a DatetimeIndex, i.e. s.index = pd.to_datetime(s.index).

In [11]: s_month = s.loc['1990-04-01'] In [12]: f(s_month) Out[12]: 1990-04-01 25.870 1990-04-02 26.205 1990-04-03 12.283 1990-04-04 19.630 ... 1990-04-28 24.478 1990-04-29 33.674 1990-04-30 32.591 Freq: D, Name: Value, dtype: float64 

Use this with a groupby apply:

In [13]: res = s.groupby(s.index).apply(f) In [14]: res Out[14]: 1990-04-01 1990-04-01 25.870 1990-04-02 26.205 1990-04-03 12.283 1990-04-04 19.630 ... 

Correct the MultiIndex:

In [15]: res.index = res.index.droplevel(0) In [16]: res Out[16]: 1990-04-01 25.870 1990-04-02 26.205 1990-04-03 12.283 ... 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.