Slice Pandas DataFrame by Row

Question

I am working with survey data loaded from an h5-file as hdf = pandas.HDFStore('Survey.h5') through the pandas package. Within this DataFrame, all rows are the results of a single survey, whereas the columns are the answers for all questions within a single survey.

I am aiming to reduce this dataset to a smaller DataFrame including only the rows with a certain depicted answer on a certain question, i.e. with all the same value in this column. I am able to determine the index values of all rows with this condition, but I can't find how to delete this rows or make a new df with these rows only.

Wouter Overmeire · Accepted Answer · 2012-08-09 11:26:08Z

In [36]: df Out[36]: A B C D a 0 2 6 0 b 6 1 5 2 c 0 2 6 0 d 9 3 2 2 In [37]: rows Out[37]: ['a', 'c'] In [38]: df.drop(rows) Out[38]: A B C D b 6 1 5 2 d 9 3 2 2 In [39]: df[~((df.A == 0) & (df.B == 2) & (df.C == 6) & (df.D == 0))] Out[39]: A B C D b 6 1 5 2 d 9 3 2 2 In [40]: df.ix[rows] Out[40]: A B C D a 0 2 6 0 c 0 2 6 0 In [41]: df[((df.A == 0) & (df.B == 2) & (df.C == 6) & (df.D == 0))] Out[41]: A B C D a 0 2 6 0 c 0 2 6 0

is it possible to slice the dataframe and say (c = 5 or c =6) like THIS: ---> df[((df.A == 0) & (df.B == 2) & (df.C == 5 or 6) & (df.D == 0))]
df[((df.A == 0) & (df.B == 2) & df.C.isin([5, 6]) & (df.D == 0))] or df[((df.A == 0) & (df.B == 2) & ((df.C == 5) | (df.C == 6)) & (df.D == 0))]

Akavall · Accepted Answer · 2016-12-11 03:14:24Z

If you already know the index you can use .loc:

In [12]: df = pd.DataFrame({"a": [1,2,3,4,5], "b": [4,5,6,7,8]}) In [13]: df Out[13]: a b 0 1 4 1 2 5 2 3 6 3 4 7 4 5 8 In [14]: df.loc[[0,2,4]] Out[14]: a b 0 1 4 2 3 6 4 5 8 In [15]: df.loc[1:3] Out[15]: a b 1 2 5 2 3 6 3 4 7

It's worth a quick note that despite the notational similarity between df.loc[1:3] and some_list[1:3], the first uses an inclusive upper index while the second (and most of python) uses an exclusive upper index.

Neil · Accepted Answer · 2022-03-24 12:07:42Z

1

If you just need to get the top rows; you can use df.head(10)

answered Mar 24, 2022 at 12:07

Neil

8,8226 gold badges55 silver badges62 bronze badges

Comments

rachwa · Accepted Answer · 2022-06-10 20:00:15Z

1

Use query to search for specific conditions:

In [3]: df Out[3]: age family name 0 1 A john 1 36 A jason 2 32 A jane 3 26 B jack 4 30 B james In [4]: df.query('age > 30 & family == "A"') Out[4]: age family name 1 36 A jason 2 32 A jane

answered Jun 10, 2022 at 20:00

rachwa

2,3901 gold badge21 silver badges20 bronze badges

2 Comments

akalanka Over a year ago

After the query, how would I slice columns? For example, only display age and name.

rachwa Over a year ago

Hi, if you want to use query, try df.query('age > 30 & family == "A"')[['age', 'name']]. Alternatively, you can use loc: df.loc[(df['age'] > 30) & (df['family'] == 'A'), ['age', 'name']].

Collectives™ on Stack Overflow

Slice Pandas DataFrame by Row

4 Answers 4

2 Comments

1 Comment

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

2 Comments

Linked

Related