53

I am working with survey data loaded from an h5-file as hdf = pandas.HDFStore('Survey.h5') through the pandas package. Within this DataFrame, all rows are the results of a single survey, whereas the columns are the answers for all questions within a single survey.

I am aiming to reduce this dataset to a smaller DataFrame including only the rows with a certain depicted answer on a certain question, i.e. with all the same value in this column. I am able to determine the index values of all rows with this condition, but I can't find how to delete this rows or make a new df with these rows only.

4 Answers 4

51
In [36]: df Out[36]: A B C D a 0 2 6 0 b 6 1 5 2 c 0 2 6 0 d 9 3 2 2 In [37]: rows Out[37]: ['a', 'c'] In [38]: df.drop(rows) Out[38]: A B C D b 6 1 5 2 d 9 3 2 2 In [39]: df[~((df.A == 0) & (df.B == 2) & (df.C == 6) & (df.D == 0))] Out[39]: A B C D b 6 1 5 2 d 9 3 2 2 In [40]: df.ix[rows] Out[40]: A B C D a 0 2 6 0 c 0 2 6 0 In [41]: df[((df.A == 0) & (df.B == 2) & (df.C == 6) & (df.D == 0))] Out[41]: A B C D a 0 2 6 0 c 0 2 6 0 
Sign up to request clarification or add additional context in comments.

2 Comments

is it possible to slice the dataframe and say (c = 5 or c =6) like THIS: ---> df[((df.A == 0) & (df.B == 2) & (df.C == 5 or 6) & (df.D == 0))]
df[((df.A == 0) & (df.B == 2) & df.C.isin([5, 6]) & (df.D == 0))] or df[((df.A == 0) & (df.B == 2) & ((df.C == 5) | (df.C == 6)) & (df.D == 0))]
27

If you already know the index you can use .loc:

In [12]: df = pd.DataFrame({"a": [1,2,3,4,5], "b": [4,5,6,7,8]}) In [13]: df Out[13]: a b 0 1 4 1 2 5 2 3 6 3 4 7 4 5 8 In [14]: df.loc[[0,2,4]] Out[14]: a b 0 1 4 2 3 6 4 5 8 In [15]: df.loc[1:3] Out[15]: a b 1 2 5 2 3 6 3 4 7 

1 Comment

It's worth a quick note that despite the notational similarity between df.loc[1:3] and some_list[1:3], the first uses an inclusive upper index while the second (and most of python) uses an exclusive upper index.
1

If you just need to get the top rows; you can use df.head(10)

Comments

1

Use query to search for specific conditions:

In [3]: df Out[3]: age family name 0 1 A john 1 36 A jason 2 32 A jane 3 26 B jack 4 30 B james In [4]: df.query('age > 30 & family == "A"') Out[4]: age family name 1 36 A jason 2 32 A jane 

2 Comments

After the query, how would I slice columns? For example, only display age and name.
Hi, if you want to use query, try df.query('age > 30 & family == "A"')[['age', 'name']]. Alternatively, you can use loc: df.loc[(df['age'] > 30) & (df['family'] == 'A'), ['age', 'name']].

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.