Adding rows to groups in Pandas DataFrame

Question

I have the following Pandas DataFrame:

 start_timestamp_milli end_timestamp_milli name rating 1 1555414708025 1555414723279 Valence 2 2 1555414708025 1555414723279 Arousal 6 3 1555414708025 1555414723279 Dominance 2 4 1555414708025 1555414723279 Sadness 1 5 1555414813304 1555414831795 Valence 3 6 1555414813304 1555414831795 Arousal 5 7 1555414813304 1555414831795 Dominance 2 8 1555414813304 1555414831795 Sadness 1 9 1555414921819 1555414931382 Valence 1 10 1555414921819 1555414931382 Arousal 7 11 1555414921819 1555414931382 Dominance 2 12 1555414921819 1555414931382 Sadness 1 13 1555414921819 1555414931382 Anger 1

In the above example there are three groups which can be grouped by start_timestamp_milli and end_timestamp_milli. The first group is from index 1 to 4, the second group from index 5 - 8 and the third group from index 9 to 13.

For each such group, if in the name column "Anger" and Happiness" is not present, I would like to insert it with a rating of 0. If it is present, nothing should happen.

The final result should look like this. The added lines are line 5, 6, 11, 12 and 18.

 start_timestamp_milli end_timestamp_milli name rating 1 1555414708025 1555414723279 Valence 2 2 1555414708025 1555414723279 Arousal 6 3 1555414708025 1555414723279 Dominance 2 4 1555414708025 1555414723279 Sadness 1 5 1555414708025 1555414723279 Happiness 0 6 1555414708025 1555414723279 Anger 0 7 1555414813304 1555414831795 Valence 3 8 1555414813304 1555414831795 Arousal 5 9 1555414813304 1555414831795 Dominance 2 10 1555414813304 1555414831795 Sadness 1 11 1555414813304 1555414831795 Happiness 0 12 1555414813304 1555414831795 Anger 0 13 1555414921819 1555414931382 Valence 1 14 1555414921819 1555414931382 Arousal 7 15 1555414921819 1555414931382 Dominance 2 16 1555414921819 1555414931382 Sadness 1 17 1555414921819 1555414931382 Happiness 0 18 1555414921819 1555414931382 Anger 1

How can this be done?

piRSquared · Accepted Answer · 2019-05-09 15:24:32Z

Option 1

This very explicitly loops through each group and appends dummy dataframe and drops duplicates.

d = dict(name=['Anger', 'Happiness'], rating=0) cols = ['start_timestamp_milli', 'end_timestamp_milli'] def f(d0, k): d1 = pd.DataFrame({**dict(zip(cols, k)), **d}) return d0.append(d1, ignore_index=True).drop_duplicates('name') pd.concat([f(d, k) for k, d in df.groupby(cols)], ignore_index=True) start_timestamp_milli end_timestamp_milli name rating 0 1555414708025 1555414723279 Valence 2 1 1555414708025 1555414723279 Arousal 6 2 1555414708025 1555414723279 Dominance 2 3 1555414708025 1555414723279 Sadness 1 4 1555414708025 1555414723279 Anger 0 5 1555414708025 1555414723279 Happiness 0 6 1555414813304 1555414831795 Valence 3 7 1555414813304 1555414831795 Arousal 5 8 1555414813304 1555414831795 Dominance 2 9 1555414813304 1555414831795 Sadness 1 10 1555414813304 1555414831795 Anger 0 11 1555414813304 1555414831795 Happiness 0 12 1555414921819 1555414931382 Valence 1 13 1555414921819 1555414931382 Arousal 7 14 1555414921819 1555414931382 Dominance 2 15 1555414921819 1555414931382 Sadness 1 16 1555414921819 1555414931382 Anger 1 17 1555414921819 1555414931382 Happiness 0

Option 2

This builds a new index and uses reindex

cats = ['Anger', 'Happiness'] cols = ['start_timestamp_milli', 'end_timestamp_milli'] d = df.set_index([*cols, 'name']) i = pd.MultiIndex.from_tuples( [(s, e, n) for s, e in {*zip(*map(df.get, cols))} for n in cats], names=d.index.names ) | d.index df.set_index([*cols, 'name']).reindex(i, fill_value=0).reset_index() start_timestamp_milli end_timestamp_milli name rating 0 1555414708025 1555414723279 Anger 0 1 1555414708025 1555414723279 Arousal 6 2 1555414708025 1555414723279 Dominance 2 3 1555414708025 1555414723279 Happiness 0 4 1555414708025 1555414723279 Sadness 1 5 1555414708025 1555414723279 Valence 2 6 1555414813304 1555414831795 Anger 0 7 1555414813304 1555414831795 Arousal 5 8 1555414813304 1555414831795 Dominance 2 9 1555414813304 1555414831795 Happiness 0 10 1555414813304 1555414831795 Sadness 1 11 1555414813304 1555414831795 Valence 3 12 1555414921819 1555414931382 Anger 1 13 1555414921819 1555414931382 Arousal 7 14 1555414921819 1555414931382 Dominance 2 15 1555414921819 1555414931382 Happiness 0 16 1555414921819 1555414931382 Sadness 1 17 1555414921819 1555414931382 Valence 1

The difference between mine and @WeNYoBen is that their answer will add zero for every unique value in name for each grouping of start and end. This may be what the OP wanted but did not explicitly say. I intentionally went through extra trouble to ensure that I only added 'Anger' and 'Happiness' if they didn't exist for a group. OP and any future readers, please choose according to your needs.
Your approach worked well. I would like to add another row to each grouping of start and end. Namely a row with name "Neutral". The rating column of the "Neutral" row should be 1 if Anger and Sadness have a rating of 0 and 0 otherwise. Is this possible?

BENY · Accepted Answer · 2019-05-09 15:20:28Z

3

I am using unstack + stack + reindex

s=set(df.name.unique().tolist()+['Anger','Happiness']) df.set_index(df.columns[:-1].tolist()).rating.\ unstack(fill_value=0).\ reindex(columns=s,fill_value=0).\ stack().reset_index()

edited May 9, 2019 at 15:20

answered May 9, 2019 at 15:04

BENY

324k22 gold badges176 silver badges250 bronze badges

3 Comments

Vink Over a year ago

This almost works, add a .drop_duplicates().reset_index() to get the desired output

BENY Over a year ago

@Vink thank you , i think using set is slightly better than drop_duplicate

Vink Over a year ago

agreed, safer & more precise

Collectives™ on Stack Overflow

Adding rows to groups in Pandas DataFrame

2 Answers 2

Option 1

Option 2

2 Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Option 1

Option 2

2 Comments

3 Comments

Related