How to insert a row in a data frame under specific conditions?

Question

I understand that pandas is designed to load completely filled dataframes, but I want to add single rows to to the data frame under specific condition. Following is the df I am dealing with

 ProjID Xcoord Ycoord 0 2 some_val some_val 1 2 some_val some_val 2 2 some_val some_val 3 2 some_val some_val 4 2 some_val some_val 5 3 some_val some_val 6 3 some_val some_val 7 5 some_val some_val 8 5 some_val some_val 9 5 some_val some_val

What I want is to insert a row in the df with value 0 for every column, whenever the ProjID changes. Below is the required df:

 ProjID Xcoord Ycoord 0 2 some_val some_val 1 2 some_val some_val 2 2 some_val some_val 3 2 some_val some_val 4 2 some_val some_val 5 0 0 0 6 3 some_val some_val 7 3 some_val some_val 8 0 0 0 9 5 some_val some_val 10 5 some_val some_val 11 5 some_val some_val 12 0 0 0

Basically a row with 0 values is inserted everytime, the ProjID changes. I was trying to write a for loop where the values in ProjID col is checked with previous row value of the ProjID col, if they are same, then it will move ahead, if they are not same, then it should insert a row with all 0 values. But, I am unable to implement it. Also, I have no idea how the index column will behave.

Please let me know if this kind of insertion of a row is possible in a df, and how do I do it. Thanks for any help.

akuiper · Accepted Answer · 2017-03-16 17:00:32Z

2

You can group data frame by ProjID, and use the append method to insert a row to each sub data frame at the end, this assumes all same ProjID have been sorted together before hand:

(df.groupby("ProjID", group_keys=False) .apply(lambda g: g.append(pd.Series(0, g.columns), ignore_index=True)) .reset_index(drop=True))

answered Mar 16, 2017 at 17:00

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

piRSquared Over a year ago

Wow! I spent a lot of time coming up with an alternative :-)

Liza Over a year ago

@Psidom Thank you so much for your help. It's perfect.

Liza Over a year ago

@Psidom can you please suggest me a way to count only the number of rows with "0"s (the one's we are inserting) in the data frame ?

piRSquared · Accepted Answer · 2017-03-16 18:06:22Z

I was determined to find a different answer after @Psidom and I thought of the same exact answer and he was quicker. So, to make myself feel better, I came up with the below answer.

Difference between mine and @Psidom's. When using groupby all ProjID will be grouped together, even if they are not next to each other. If they are always expected to be next to each other, then it doesn't matter. However, if you want to count 111222111 as two separate changes, then this solution accounts for that.

pidv = df.ProjID.values pid_chg = np.append(pidv[:-1] != pidv[1:], True) i = df.index.repeat(pid_chg + 1) d1 = df.loc[i, :].copy() d1.loc[i.duplicated()] = 0 d1.reset_index(drop=True) ProjID Xcoord Ycoord 0 2 some_val some_val 1 2 some_val some_val 2 2 some_val some_val 3 2 some_val some_val 4 2 some_val some_val 5 0 0 0 6 3 some_val some_val 7 3 some_val some_val 8 0 0 0 9 5 some_val some_val 10 5 some_val some_val 11 5 some_val some_val 12 0 0 0

timing over sample data

Thank you so much, since your way is faster, i will implement yours, though the code is lil longer.
Also can you please suggest me a way to count only the number of rows with "0"s (the one's we are inserting) in the data frame ?
@Liza conveniently i.duplicated().sum() should give you that.
It gives the following error- AttributeError: 'int' object has no attribute 'duplicated'
@Liza in my code i = df.index.repeat(pid_chg + 1)... i should not be an int. You might be overwriting it somewhere else.

Collectives™ on Stack Overflow

How to insert a row in a data frame under specific conditions?

2 Answers 2

3 Comments

7 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

7 Comments

Linked

Related