9

Let's say I have a time series dataframe with a categorical variable and a value:

In [4]: df = pd.DataFrame(data={'category': np.random.choice(['A', 'B', 'C', 'D'], 11), 'value': np.random.rand(11)}, index=pd.date_range('2015-04-20','2015-04-30')) In [5]: df Out[5]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-23 B 0.337535 2015-04-24 B 0.747340 2015-04-25 B 0.839823 2015-04-26 D 0.292628 2015-04-27 D 0.906340 2015-04-28 B 0.244044 2015-04-29 A 0.070764 2015-04-30 D 0.132221 

If I'm interested in rows with category A, filtering to isolate them is trivial. But what if I'm interested in the n rows before category A's as well? If n=2, I'd like to see something like:

In [5]: df[some boolean indexing] Out[5]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-27 D 0.906340 2015-04-28 B 0.244044 2015-04-29 A 0.070764 

Similarly, what if I'm interested in the n rows around category A's? Again if n=2, I'd like to see this:

In [5]: df[some other boolean indexing] Out[5]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-23 B 0.337535 2015-04-24 B 0.747340 2015-04-27 D 0.906340 2015-04-28 B 0.244044 2015-04-29 A 0.070764 2015-04-30 D 0.132221 

Thanks!

1

2 Answers 2

9

n rows around category A's:

In [223]: idx = df.index.get_indexer_for(df[df.category=='A'].index) In [224]: n = 1 In [225]: df.iloc[np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(df))) for i in idx]))] Out[225]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-23 B 0.337535 2015-04-28 B 0.244044 2015-04-29 A 0.070764 2015-04-30 D 0.132221 In [226]: n = 2 In [227]: df.iloc[np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(df))) for i in idx]))] Out[227]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-23 B 0.337535 2015-04-24 B 0.747340 2015-04-27 D 0.906340 2015-04-28 B 0.244044 2015-04-29 A 0.070764 2015-04-30 D 0.132221 
Sign up to request clarification or add additional context in comments.

2 Comments

Any thoughts on how this could be solved when looking at a grouped dataframe?
I did something like this def takeRange(df: pd.DataFrame, range: int): idx = df.index.get_indexer_for(df[df["Stop"] != "Running"].index) return df.iloc[ np.unique( np.concatenate( [np.arange(max(i - range, 0), min(i + range + 1, len(df))) for i in idx] ) ) ]
6

To answer your first question:

df[pd.concat([df.category.shift(-i)=='A' for i in range(n)], axis=1).any(axis=1)] 

You will hopefully be able to extend the same (perhaps a somewhat clumsy one) approach to cover more cases.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.