Selecting rows before and after rows of interest in Pandas

Question

Let's say I have a time series dataframe with a categorical variable and a value:

In [4]: df = pd.DataFrame(data={'category': np.random.choice(['A', 'B', 'C', 'D'], 11), 'value': np.random.rand(11)}, index=pd.date_range('2015-04-20','2015-04-30')) In [5]: df Out[5]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-23 B 0.337535 2015-04-24 B 0.747340 2015-04-25 B 0.839823 2015-04-26 D 0.292628 2015-04-27 D 0.906340 2015-04-28 B 0.244044 2015-04-29 A 0.070764 2015-04-30 D 0.132221

If I'm interested in rows with category A, filtering to isolate them is trivial. But what if I'm interested in the n rows before category A's as well? If n=2, I'd like to see something like:

In [5]: df[some boolean indexing] Out[5]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-27 D 0.906340 2015-04-28 B 0.244044 2015-04-29 A 0.070764

Similarly, what if I'm interested in the n rows around category A's? Again if n=2, I'd like to see this:

In [5]: df[some other boolean indexing] Out[5]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-23 B 0.337535 2015-04-24 B 0.747340 2015-04-27 D 0.906340 2015-04-28 B 0.244044 2015-04-29 A 0.070764 2015-04-30 D 0.132221

Thanks!

You might find this helpful: stackoverflow.com/questions/28837633/… — A. Entuluva
– A. Entuluva, Commented Feb 9, 2017 at 22:14

MaxU - stand with Ukraine · Accepted Answer · 2017-02-09 22:43:03Z

n rows around category A's:

In [223]: idx = df.index.get_indexer_for(df[df.category=='A'].index) In [224]: n = 1 In [225]: df.iloc[np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(df))) for i in idx]))] Out[225]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-23 B 0.337535 2015-04-28 B 0.244044 2015-04-29 A 0.070764 2015-04-30 D 0.132221 In [226]: n = 2 In [227]: df.iloc[np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(df))) for i in idx]))] Out[227]: category value 2015-04-20 D 0.220804 2015-04-21 A 0.992445 2015-04-22 A 0.743648 2015-04-23 B 0.337535 2015-04-24 B 0.747340 2015-04-27 D 0.906340 2015-04-28 B 0.244044 2015-04-29 A 0.070764 2015-04-30 D 0.132221

Any thoughts on how this could be solved when looking at a grouped dataframe?
I did something like this def takeRange(df: pd.DataFrame, range: int): idx = df.index.get_indexer_for(df[df["Stop"] != "Running"].index) return df.iloc[ np.unique( np.concatenate( [np.arange(max(i - range, 0), min(i + range + 1, len(df))) for i in idx] ) ) ]

DYZ · Accepted Answer · 2017-02-09 22:21:41Z

To answer your first question:

df[pd.concat([df.category.shift(-i)=='A' for i in range(n)], axis=1).any(axis=1)]

You will hopefully be able to extend the same (perhaps a somewhat clumsy one) approach to cover more cases.

Collectives™ on Stack Overflow

Selecting rows before and after rows of interest in Pandas

2 Answers 2

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Linked

Related