1

For the problem, I want to calculate the mean for a particular member with a time window of 3 years. For example for the following data frame:

member_id Feature 1 Feature 2 Feature 3 Date
1 0.1 0.5 0.2 1/2/20
1 0.2 0.3 0.3 1/2/18
1 0.3 0.2 0.2 1/2/16
1 0.1 0.2 0.1 1/4/17
2 0.4 0.1 0.4 1/2/18
2 0.5 0.1 0.2 1/2/15

Taking into consideration of member_id = 1, there are different dates of admissions but we would want to consider only those whose time-window lies in the 3-year gap.

So, for the first row, the member_id = 1 with the date 1/2/20 would have the window from 1/2/20 - 1/2/17. Thus, while taking the aggregate value for the feature, rows 1,2, and 4 would be selected excluding the 3rd row where the date is 1/2/16.

I want the first row values to be like:

member_id Feature 1_mean Feature 2_mean Feature 3_mean Date
1 0.13 0.33 0.2 1/2/20
1 0.2 0.3 0.3 1/2/18
1 0.3 0.2 0.2 1/2/16
1 0.1 0.2 0.1 1/4/17
2 0.4 0.1 0.4 1/2/18
2 0.5 0.1 0.2 1/2/15

Now, similarly the same for row values 2 and 3 and so forth. The best way I thought would be to, first group by the members and then sort the data. After that iterate through each row calculating the mean.

I had written a sample code but it grouped it by the date causing multiple additional rows.

df['date'] = pd.to_datetime(df['date']) results = dict() for start in df['date']: end = start.date() - relativedelta(years=3) screen = (df.date <= start) & (df.date >= end) # screen for 3 years of data singles = df.loc[screen, ['Feature 1', 'Feature 2', 'Feature 3' , 'member_id']] # screen for 3 year window member_id by groups x = singles.groupby('member_id').mean() results[start] = x results = pd.concat(results) 

What is the best way to achieve the result?

1 Answer 1

0

Proceed as follows:

  1. Define a function to get mean features for particular row from the current group, for the required period:

    def getRowMeans(row, grp): dTo = row.Date return grp[grp.Date.between(dTo - pd.DateOffset(years=3), dTo)]\ .loc[:, 'Feature 1' : 'Feature 3'].mean() 

    The idea is:

    • the current object is a row, from a group of rows (for some member_id),
    • for the operations below the whole group must also be known, so it is another parameter of this function (grp),
    • from grp take rows for 3 years preceding from the Date from the current row (including this date),
    • from these rows take all 3 Feature columns and return their means.
  2. Define a function, to be called for each group of rows (for some member_id), returning a copy of this group with all Feature columns replaced with their means (generated by getRowMeans):

    def FeaturesToMeans(grp): means = grp.apply(getRowMeans, axis=1, grp=grp) rv = grp.copy() rv.update(means) return rv 

    The first step is to compute feature means.

    In order not to alter the original group, the object to be finally returned (rv) must be created as a copy of the original group.

    Then it is updated with the just computed means. Note however that update operates in place and does not return any result.

    The returned object is the updated group.

  3. Generate the actual result, as a new DataFrame:

    result = df.groupby('member_id', group_keys=False).apply(FeaturesToMeans) 

The result, for your sample data, is:

 member_id Feature 1 Feature 2 Feature 3 Date 0 1 0.133333 0.333333 0.20 2020-01-02 1 1 0.200000 0.233333 0.20 2018-01-02 2 1 0.300000 0.200000 0.20 2016-01-02 3 1 0.200000 0.200000 0.15 2017-01-04 4 2 0.450000 0.100000 0.30 2018-01-02 5 2 0.500000 0.100000 0.20 2015-01-02 
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you Valdi_Bo! This was really helpful but I have been required to calculate for every admission date. So, like the first row is for the admission date 1/2/20 to 1/2/17. But the admission in the second row is on the date 1/2/18, so the average date range for it would be from 1/2/18 to 1/2/15. I have to move down each row and for a particular member and date combination calculate the average for 3 years data window. A little complicated.
I created a totally new answer, taking into account your comment.
Valdi_Bo, thank you! This is the output that was required.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.