I have a pandas (v 0.12) dataframe data in python (2.7). I groupby() with respect to the A and B colmuns in data to form the groups object which is of type <class 'pandas.core.groupby.DataFrameGroupBy'>.
I want to loop through and apply a function to the dataframes within groups that have more than one row in them. My code is below, here each dataframe is the value in the key,value pair:
import pandas as pd groups = data.groupby(['A','B']) len(groups) >> 196320 # too large - will be slow to iterate through all for key, value in groups: if len(value)>1: print(value) Since I am only interested in applying the function to values where len(value)>1, is it possible to save time by embedding this condition to filter and loop through only the key-value pairs that satisfy this condition. I can do something like below to ascertain the size of each value but I am not sure how to marry this aggreagation with the original groups object.
size_values = data.groupby(['A','B']).agg({'C' : [np.size]}) I am hoping the question is clear, please let me know if any clarification is needed.