0

I am quite new to pandas, but I use python at a good level.

I have a pandas dataframe which is organized as follows

idrun idbasin time q -192540 1 0 0 -192540 1 1 0.5 ... -192540 2 0 0 -192540 2 1 1 ... -192540 3 0 0 -192540 3 1 1 ... -192541 1 0 0 -192541 1 1 0.5 ... -192541 2 0 0 -192541 2 1 1 ... -192541 3 0 0 -192541 3 1 1 ... 

It is a fairly large dataframe (7 columns and ~600k rows).

What I would like to do is: given a tuple containing values referring to the idbasin column (e.g. (1,2)), if the idrun value is the same

  1. sum the q column of the referred idbasin values, i.e. for the example it would be (1,2)
  2. remove the rows corresponding to that idrun value and the tuple-specified idbasin values
  3. insert the summed values with idbasin equal to the first number of the tuple.

Referring to my example df, the results would be

idrun idbasin time q -192540 1 0 0 -192540 1 1 1.5 ... -192540 3 0 0 -192540 3 1 1 ... -192541 1 0 0 -192541 1 1 1.5 ... -192541 3 0 0 -192541 3 1 1 ... 

My solution would to use groupby to turn the df to a dict and then do the operation with one or two for loops, but I understand that iterating in pandas is not the optimal solution, so I believe there could be a "pandas" solution using the df.

1 Answer 1

1

You can replace values of tuple by first value of tuple in Series.mask and then aggregate sum:

tup = (1, 2) df['idbasin'] = df['idbasin'].mask(df['idbasin'].isin(tup), tup[0]) #alternative #df['idbasin'] = np.where(df['idbasin'].isin(tup), tup[0], df['idbasin']) df = df.groupby(['idrun', 'idbasin','time'], as_index=False)['q'].sum() print (df) idrun idbasin time q 0 -192541 1 0 0.0 1 -192541 1 1 1.5 2 -192541 3 0 0.0 3 -192541 3 1 1.0 4 -192540 1 0 0.0 5 -192540 1 1 1.5 6 -192540 3 0 0.0 7 -192540 3 1 1.0 
Sign up to request clarification or add additional context in comments.

1 Comment

That's great. Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.