0

I have a pandas (v 0.12) dataframe data in python (2.7). I groupby() with respect to the A and B colmuns in data to form the groups object which is of type <class 'pandas.core.groupby.DataFrameGroupBy'>.

I want to loop through and apply a function to the dataframes within groups that have more than one row in them. My code is below, here each dataframe is the value in the key,value pair:

import pandas as pd groups = data.groupby(['A','B']) len(groups) >> 196320 # too large - will be slow to iterate through all for key, value in groups: if len(value)>1: print(value) 

Since I am only interested in applying the function to values where len(value)>1, is it possible to save time by embedding this condition to filter and loop through only the key-value pairs that satisfy this condition. I can do something like below to ascertain the size of each value but I am not sure how to marry this aggreagation with the original groups object.

size_values = data.groupby(['A','B']).agg({'C' : [np.size]}) 

I am hoping the question is clear, please let me know if any clarification is needed.

1 Answer 1

1

You could assign length of the group back to column and filter by its value:

data['count'] = data.groupby(['A','B'],as_index=False)['A'].transform(np.size) 

After that you could:

data[data['count'] > 1].groupby(['A','B']).apply(your_function) 

Or just skip assignment if it is a one time operation:

 data[data.groupby(['A','B'],as_index=False)['A'].transform(np.size) > 1].groupby(['A','B']).apply(your_function) 
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, but the first line on its own I think runs longer than my for loop. I did not realise I could use .apply() on the DataFrameGroupBy though, so maybe that would speed things up (as compared to my clumsy for loop)
.transform is usually pretty fast and combined with np.size it is unlikely it will be slower than your function.
so it is .transform(np.size), not .transform('count')
Correct, you can use whatever is faster as long it returns a scalar to transform.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.