-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Open
Labels
Description
Since pandas 0.25.0 we have named aggregations.
Which works fine if you do aggregations on single columns. But what if you want to apply aggregations over multiple columns:
example:
# example dataframe df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd')) df['group'] = [0, 0, 1, 1] a b c d group 0 0.751462 0.572576 0.192957 0.921723 0 1 0.070777 0.801548 0.601678 0.344633 0 2 0.112964 0.361984 0.416241 0.785764 1 3 0.380045 0.486494 0.000594 0.608759 1 # aggregations on single columns df.groupby('group').agg( a_sum=('a', 'sum'), a_mean=('a', 'mean'), b_mean=('b', 'mean'), c_sum=('c', 'sum'), d_range=('d', lambda x: x.max() - x.min()) ) a_sum a_mean b_mean c_sum d_range group 0 0.947337 0.473668 0.871939 0.838150 0.320543 1 0.604149 0.302074 0.656902 0.542985 0.057681But what if we want to calculate the a.max() - b.max() while aggregating. That does not seem to work. For example, something like this would make sense:
df.groupby('group').agg( diff_a_b=(['a', 'b'], lambda x: x['a'].max() - x['b'].max()) )So is it possible to do named aggregations on multiple columns? If not, is this in the pipeline for future releases?
elikesprogramming, delica1, botant, theSuiGenerisAakash, tdehouck and 29 more