0

I am looking for a code that would get rid of rows for instances where the REVENUE_STATUS_FLAG is 0 before it becomes 1. HOWEVER, I would not like to get rid of the 0's after where 1 has occurred for that SCU_KEY (which is all based on sorted date). So with the image shown, the values I would like to get rid of these rows, which I categorized based on the index:

1166, 221 - (first two off SCU_KEY 3)

333, 1645, 1614 - (from SCU_KEY 14)

You will notice that I don't want to get rid of 334 from SCU_KEY 3 and that is because of the 1 occurring right before it from 45. This df is bigger than is shown so manually inputting the specific numbers mentioned does not suffice. enter image description here

2
  • In the shown table, "SCU_KEY" is never 0 or 1. Commented Oct 20, 2021 at 22:10
  • Thank you for that correction! I meant to say REVENUE_STATUS_FLAG. You can read above for where I made the change as well. Sorry :) Commented Oct 20, 2021 at 22:12

1 Answer 1

1

Since you only care about the 0's before the 1's, you can use the cumulative sum function over the SCU_KEY to find the rows where it is the first non-zero or after the first non-zero. Cumulative sum becomes how many 1's were seen in this row or before this row. As long as that is 1 or more, you keep the row.

df = <your data frame> df[df.groupby('SCU_KEY').cumsum().values >= 1]` 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! That worked!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.