Pandas DataFrame get rows where index matches a certain condition

Question

I have a Pandas dataframe from which I need all of the rows where the index matches a certain condition. The dataframe has a MultiIndex, and I need the rows where the first index, the TimeStamp, is in a specific range. Level 1 of the MultiIndex is a Series of DateTime objects. This following line of code works to check if the month is equal to 5:

compare[compare.index.get_level_values(0).month == 5]

But when I modify the code to check for the rows where the value is in a certain array

compare[compare.index.get_level_values(0).month in [5, 6, 7]]

I get the error

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I also tried using df.loc to get the values.

compare.loc[compare.index.get_level_values(0).month in [5, 6, 7]]

But this results in the same error.

I also tried using the isin method instead.

compare[compare.index.get_level_values(0).month.isin([5, 6, 7])]

But this results in the following attribute error:

AttributeError: 'numpy.ndarray' object has no attribute 'isin'

How do I get the rows of the DataFrame where the index meets a specific condition?

can you post an output of print(compare.index.get_level_values(0)[:5]) ? And what is your Pandas version? — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Aug 11, 2017 at 23:11
DatetimeIndex(['2016-01-04 01:40:00', '2016-01-04 02:00:00', '2016-01-04 02:10:00', '2016-01-04 02:30:00', '2016-01-04 02:40:00'], dtype='datetime64[ns]', name=u'TTimeStamp', freq=None) is the output I get from the print statement. I am using Pandas 0.20.3 — victor
– victor, Commented Aug 11, 2017 at 23:36
please see UPDATE in my answer - i can't reproduce your error... I'm using Pandas 0.20.1 — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Aug 11, 2017 at 23:39

MaxU - stand with Ukraine · Accepted Answer · 2017-08-11 23:45:50Z

Try this:

compare[compare.index.get_level_values(0).month.isin([5, 6, 7])]

PS this should work for Pandas version 0.18.1+

Demo:

In [45]: import pandas_datareader.data as web In [46]: df = web.DataReader('AAPL', 'google', '2017-06-01') In [48]: df = df.assign(i2=np.arange(len(df))).set_index('i2', append=True) In [49]: df Out[49]: Open High Low Close Volume Date i2 2017-06-01 0 153.17 153.33 152.22 153.18 16404088 2017-06-02 1 153.58 155.45 152.89 155.45 27770715 2017-06-05 2 154.34 154.45 153.46 153.93 25331662 2017-06-06 3 153.90 155.81 153.78 154.45 26624926 2017-06-07 4 155.02 155.98 154.48 155.37 21069647 2017-06-08 5 155.25 155.54 154.40 154.99 21250798 2017-06-09 6 155.19 155.19 146.02 148.98 64882657 2017-06-12 7 145.74 146.09 142.51 145.42 72307330 2017-06-13 8 147.16 147.45 145.15 146.59 34165445 2017-06-14 9 147.50 147.50 143.84 145.16 31531232 ... ... ... ... ... ... 2017-07-31 41 149.90 150.33 148.13 148.73 19845920 2017-08-01 42 149.10 150.22 148.41 150.05 35368645 2017-08-02 43 159.28 159.75 156.16 157.14 69936800 2017-08-03 44 157.05 157.21 155.02 155.57 27097296 2017-08-04 45 156.07 157.40 155.69 156.39 20559852 2017-08-07 46 157.06 158.92 156.67 158.81 21870321 2017-08-08 47 158.60 161.83 158.27 160.08 36205896 2017-08-09 48 159.26 161.27 159.11 161.06 26131530 2017-08-10 49 159.90 160.00 154.63 155.32 40804273 2017-08-11 50 156.60 158.57 156.07 157.48 26180743 [51 rows x 5 columns] In [50]: df[df.index.get_level_values(0).month.isin([5,8])] Out[50]: Open High Low Close Volume Date i2 2017-08-01 42 149.10 150.22 148.41 150.05 35368645 2017-08-02 43 159.28 159.75 156.16 157.14 69936800 2017-08-03 44 157.05 157.21 155.02 155.57 27097296 2017-08-04 45 156.07 157.40 155.69 156.39 20559852 2017-08-07 46 157.06 158.92 156.67 158.81 21870321 2017-08-08 47 158.60 161.83 158.27 160.08 36205896 2017-08-09 48 159.26 161.27 159.11 161.06 26131530 2017-08-10 49 159.90 160.00 154.63 155.32 40804273 2017-08-11 50 156.60 158.57 156.07 157.48 26180743

UPDATE: test with your index values:

In [56]: i = pd.DatetimeIndex(['2016-01-04 01:40:00', '2016-01-04 02:00:00', '2016-01-04 02:10:00', '2016-01-04 02:30:00', '2016-01-04 02:4 ...: 0:00'], dtype='datetime64[ns]', name=u'TTimeStamp', freq=None) In [57]: i Out[57]: DatetimeIndex(['2016-01-04 01:40:00', '2016-01-04 02:00:00', '2016-01-04 02:10:00', '2016-01-04 02:30:00', '2016-01-04 02:40:00'], dtype='datetime64[ns]', name='TTimeStamp', freq=None) In [58]: i.month Out[58]: Int64Index([1, 1, 1, 1, 1], dtype='int64', name='TTimeStamp') In [59]: i.month.isin([2,3]) Out[59]: array([False, False, False, False, False], dtype=bool) In [60]: i.month.isin([1,2,3]) Out[60]: array([ True, True, True, True, True], dtype=bool)

UPDATE2: try the following workaround:

compare[pd.Series(compare.index.get_level_values(0).month).isin([5, 6, 7]).values]

Sorry, I forgot to add that I tried this too and it didn't work...question is edited with the error I got too
I created the DatetimeIndex and called i.month. Instead of creating an Int54Index, it instead creates a ndarray with dtype int32. It seems this is the source of the error, where the call to month creates an ndarray instead of an Index object. I checked the original index, and the same thing happened--including the month attribute causes the DatetimeIndex to be converted to an ndarray. I haven't found the solution yet, though.
All right, that worked perfectly! Thanks for the workaround! I'm still curious as to why the ndarray is created from the DatetimeIndex object though--I'll continue looking around with that and post here if I figure out why.

Collectives™ on Stack Overflow

Pandas DataFrame get rows where index matches a certain condition

1 Answer 1

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Linked

Related