Skip to content

Named aggregations with multiple columns #29268

@erfannariman

Description

@erfannariman

Since pandas 0.25.0 we have named aggregations.

Which works fine if you do aggregations on single columns. But what if you want to apply aggregations over multiple columns:

example:

# example dataframe df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd')) df['group'] = [0, 0, 1, 1] a b c d group 0 0.751462 0.572576 0.192957 0.921723 0 1 0.070777 0.801548 0.601678 0.344633 0 2 0.112964 0.361984 0.416241 0.785764 1 3 0.380045 0.486494 0.000594 0.608759 1 # aggregations on single columns df.groupby('group').agg( a_sum=('a', 'sum'), a_mean=('a', 'mean'), b_mean=('b', 'mean'), c_sum=('c', 'sum'), d_range=('d', lambda x: x.max() - x.min()) ) a_sum a_mean b_mean c_sum d_range group 0 0.947337 0.473668 0.871939 0.838150 0.320543 1 0.604149 0.302074 0.656902 0.542985 0.057681

But what if we want to calculate the a.max() - b.max() while aggregating. That does not seem to work. For example, something like this would make sense:

df.groupby('group').agg( diff_a_b=(['a', 'b'], lambda x: x['a'].max() - x['b'].max()) )

So is it possible to do named aggregations on multiple columns? If not, is this in the pipeline for future releases?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions