Pandas: how to select rows based on two conditions in the same column

Question

I have the following table and I am trying to select the 'col1' and 'col2' pairs that , at some point in the dataframe, appear with the value 'D' and 'E' in column 'C'.

col1 col2 C val aaa rte_1 D 58 aaa rte_2 E 47 bbb rte_3 D 2 aaa rte_4 E 35 aaa rte_5 E 95 ttt rte_6 E 84 aaa rte_1 D 57 ddd rte_2 C 36 aaa rte_3 C 13 aaa rte_4 C 28 aaa rte_5 E 3

In other words, the result should be

col1 col2 C val aaa rte_1 D 58 aaa rte_5 E 95 aaa rte_1 D 57 aaa rte_5 E 3

I have tried something like this but I get an empty dataframe so it is obviously wrong.

d = {'col1' : ['aaa', 'aaa', 'bbb', 'aaa', 'aaa', 'ttt', 'aaa', 'ddd', 'aaa', 'aaa', 'aaa'], 'col2' : ['rte_1', 'rte_2', 'rte_3', 'rte_4', 'rte_5', 'rte_6, 'rte_1', 'rte_2', 'rte_3', 'rte_4', 'rte_5'], 'C' : ['D', 'E', 'D', 'E', 'E', 'E', 'D', 'C', 'C', 'C', 'E'], 'val' : ['58', '47', '2', '35', '95', '84', '57', '36', '13', '28', '3']} df = pd.DataFrame(d) df2=df.loc[(df.C =='D')&(df.C =='E')]['A', 'B']

How can I do this?

EDIT: When I say that I want to select values that have both "E" and "D", I mean that I want to select the rows that have the same 'col1' and 'col2' pairs and have a "D" and then, at some point in the dataframe, they occur again and have a "E" (or viceversa, "E" first and "D" later). I hope this clarifies the question.

Do you mean ALL col C values for the group have to be either D or E? — David Erickson
– David Erickson, Commented Dec 19, 2020 at 20:35
Thank you for the replies! I have included the desired output in the question. Column C should have 'D' and 'E'. I will edit the question to make it more clear as I think that some may have understood this. — LeoLore
– LeoLore, Commented Dec 19, 2020 at 21:14
The pair aaa, rte_1 is in your expected output, but only has a D value followed by a D value, so it doesn't have any E values, which contradicts what you are saying "Column C should have 'D' and 'E'." My answer gives you the expected output. Can you please confirm if that is what you are trying to do? — David Erickson
– David Erickson, Commented Dec 19, 2020 at 21:22

David Erickson · Accepted Answer · 2020-12-19 21:18:07Z

What it sounds like you may be trying to do is see if ALL values in a group are only 'B' or 'E'. At the same time, your expected output has also excluded rows that meet that condition but only have one member of the group. You can groupby the "pair" columns you have mentioned and use list comprehension to check if all values are either D or E with all([True... ). I have also included an additional piece of logic and len(x) > 1, since your output excludes groups with only one row. This creates a boolean series s of True or False if the condition is met, which you can use to filter directly on the dataframe, and get the "expected output".

s = df.merge(df.groupby(['col1', 'col2'])['C'].apply(lambda x: all([True if y in ['D', 'E'] and len(x) > 1 else False for y in x ])) .reset_index(), how='left', on=['col1', 'col2']).iloc[:,-1] df[s] Out[1]: col1 col2 C val 0 aaa rte_1 D 58 4 aaa rte_5 E 95 6 aaa rte_1 D 57 10 aaa rte_5 E 3

Amit · Accepted Answer · 2020-12-19 20:42:17Z

You are asking for search to give true when the column C is both D and E at the same time which is not there. I think you meant OR and for that the operator is "|". Little change to your code.

d = {'col1' : ['aaa', 'aaa', 'bbb', 'aaa', 'aaa', 'ttt', 'aaa', 'ddd', 'aaa', 'aaa', 'aaa'], 'col2' : ['rte_1', 'rte_2', 'rte_3', 'rte_4', 'rte_5', 'rte_6', 'rte_1', 'rte_2', 'rte_3', 'rte_4', 'rte_5'], 'C' : ['D', 'E', 'D', 'E', 'E', 'E', 'D', 'C', 'C', 'C', 'E'], 'val' : ['58', '47', '2', '35', '95', '84', '57', '36', '13', '28', '3']} df = pd.DataFrame(d) df2=df[((df.C =='D')|(df.C =='E'))] print(df2)

It gives this output for the data you gave.

 col1 col2 C val 0 aaa rte_1 D 58 1 aaa rte_2 E 47 2 bbb rte_3 D 2 3 aaa rte_4 E 35 4 aaa rte_5 E 95 5 ttt rte_6 E 84 6 aaa rte_1 D 57 10 aaa rte_5 E 3

Gerges · Accepted Answer · 2020-12-19 20:51:04Z

You may also use the isin method, which can be handy when you have a lot of conditions. You can use it like this:

df[df.C.isin(['D', 'E'])]

to select rows and all columns, and this way to to get only the two columns you need:

df[df.C.isin(['D', 'E']), ['col1', 'col2']]

Collectives™ on Stack Overflow

Pandas: how to select rows based on two conditions in the same column

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related