1

Hello I would like to calculate the average of days column for each Name group at each Time. The average at each Time for each group should be calculated only based on the elements of the rows of the Days which are larger than 0. Any feedback would be highly appreciated ..

Name Time Days Average John 2021-12-02 0 0 John 2021-12-03 2 0 John 2021-12-05 9 2 John 2021-12-07 0 5.5 John 2021-12-10 10 5.5 Larry 2021-12-02 20 0 Jim 2021-12-09 20 0 Jim 2021-12-10 20 20 Jim 2021-12-12 40 20 Jim 2021-12-12 0 26.6 Juli 2021-11-09 0 0 Juli 2021-11-10 0 0 Juli 2021-11-12 40 0 Juli 2021-11-18 0 40 Juli 2021-11-12 0 40 Juli 2021-11-18 2 40 Juli 2021-11-19 0 21 
1
  • please share dataframe creator code Commented Jan 25, 2022 at 14:35

1 Answer 1

1

First replace 0 to missing values and then use GroupBy.transform with lambda function for Series.expanding with mean and Series.shift, last replace NaNs to 0 by Series.fillna:

df['Avg'] = (df.assign(Days = df['Days'].replace(0,np.nan)) .groupby('Name')['Days'] .transform(lambda x: x.expanding().mean().shift()) .fillna(0)) print (df) Name Time Days Average Avg 0 John 2021-12-02 0 0.0 0.000000 1 John 2021-12-03 2 0.0 0.000000 2 John 2021-12-05 9 2.0 2.000000 3 John 2021-12-07 0 5.5 5.500000 4 John 2021-12-10 10 5.5 5.500000 5 Larry 2021-12-02 20 0.0 0.000000 6 Jim 2021-12-09 20 0.0 0.000000 7 Jim 2021-12-10 20 20.0 20.000000 8 Jim 2021-12-12 40 20.0 20.000000 9 Jim 2021-12-12 0 26.6 26.666667 10 Juli 2021-11-09 0 0.0 0.000000 11 Juli 2021-11-10 0 0.0 0.000000 12 Juli 2021-11-12 40 0.0 0.000000 13 Juli 2021-11-18 0 40.0 40.000000 14 Juli 2021-11-12 0 40.0 40.000000 15 Juli 2021-11-18 2 40.0 40.000000 16 Juli 2021-11-19 0 21.0 21.000000 
Sign up to request clarification or add additional context in comments.

2 Comments

I would like the average to be calculated only taken into account non zero Days elements the solution provided uses all the elements to calculate the average for each group which is not what i would like to do i.e for row 4 for John uses row 0,1,2 and 3 I would like to use only elements with non zero days element row1 and 2 and not row 0 and 3
@konrac - yes, and for it is replaced 0 to np.nan, then are not used rows 0,3, because filled by missing values. So not sure if understand what is problem.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.