I have the following table and I am trying to select the 'col1' and 'col2' pairs that , at some point in the dataframe, appear with the value 'D' and 'E' in column 'C'.
col1 col2 C val aaa rte_1 D 58 aaa rte_2 E 47 bbb rte_3 D 2 aaa rte_4 E 35 aaa rte_5 E 95 ttt rte_6 E 84 aaa rte_1 D 57 ddd rte_2 C 36 aaa rte_3 C 13 aaa rte_4 C 28 aaa rte_5 E 3 In other words, the result should be
col1 col2 C val aaa rte_1 D 58 aaa rte_5 E 95 aaa rte_1 D 57 aaa rte_5 E 3 I have tried something like this but I get an empty dataframe so it is obviously wrong.
d = {'col1' : ['aaa', 'aaa', 'bbb', 'aaa', 'aaa', 'ttt', 'aaa', 'ddd', 'aaa', 'aaa', 'aaa'], 'col2' : ['rte_1', 'rte_2', 'rte_3', 'rte_4', 'rte_5', 'rte_6, 'rte_1', 'rte_2', 'rte_3', 'rte_4', 'rte_5'], 'C' : ['D', 'E', 'D', 'E', 'E', 'E', 'D', 'C', 'C', 'C', 'E'], 'val' : ['58', '47', '2', '35', '95', '84', '57', '36', '13', '28', '3']} df = pd.DataFrame(d) df2=df.loc[(df.C =='D')&(df.C =='E')]['A', 'B'] How can I do this?
EDIT: When I say that I want to select values that have both "E" and "D", I mean that I want to select the rows that have the same 'col1' and 'col2' pairs and have a "D" and then, at some point in the dataframe, they occur again and have a "E" (or viceversa, "E" first and "D" later). I hope this clarifies the question.
aaa, rte_1is in your expected output, but only has aDvalue followed by aDvalue, so it doesn't have anyEvalues, which contradicts what you are saying"Column C should have 'D' and 'E'."My answer gives you the expected output. Can you please confirm if that is what you are trying to do?