I have a data frame with 200k rows and i try to add columns based on other rows with some conditions. I tried to achieve it but take a lot of time(2 hours).
Here is my code :
for index in dataset.index: A_id = dataset.loc[index, 'A_id'] B_id = dataset.loc[index, 'B_id'] C_date = dataset.loc[index, 'C_date'] subset = dataset[ (dataset['A_id'] == A_id) & (dataset['B_id'] == B_id) & ( dataset['C_date'] < C_date)] dataset.at[index, 'D_mean'] = subset['D'].mean() dataset.at[index, 'E_mean'] = subset['E'].mean() My data frame looks this:
A = [1, 2, 1, 2, 1, 2] B = [10, 20, 10, 20, 10, 20] C = ["22-02-2019", "28-02-19", "07-03-2019", "14-03-2019", "21-12-2019", "11-10-2019"] D = [10, 12, 21, 81, 20, 1] E = [7, 10, 14, 31, 61, 9] dataset = pd.DataFrame({ 'A_id': A, 'B_id': B, 'C_date': C, 'D': D, 'E': E, }) dataset.C_date = pd.to_datetime(dataset.C_date) dataset Out[27]: A_id B_id C_date D E 0 1 10 2019-02-22 10 7 1 2 20 2019-02-28 12 10 2 1 10 2019-07-03 21 14 3 2 20 2019-03-14 81 31 4 1 10 2019-12-21 20 61 5 2 20 2019-11-10 1 9 I would like to have this result in better effective way than my solution :
A_id B_id C_date D E D_mean E_mean 0 1 10 2019-02-22 10 7 NaN NaN 1 2 20 2019-02-28 12 10 NaN NaN 2 1 10 2019-07-03 21 14 10.0 7.0 3 2 20 2019-03-14 81 31 12.0 10.0 4 1 10 2019-12-21 20 61 15.5 10.5 5 2 20 2019-11-10 1 9 46.5 20.5 Do you have an idea ?