How to sort a pandas dataframe on two (or more) different columns, in a particular order

Question

Note

df2 is the only thing that can be used here - using df or df1 would be to use data that isn't possible. The data is received as df2, it wants to be manipulated to the form of df1. Neither df1 or df can be used as part of of the solution (as df1 is the solution).

Setup test data

this is just setup for this post.

# sample data reps = {1: "dog", 2: "ant", 3: "cat", 6: "orange", 7: "apple", 8: "grape"} df = pd.DataFrame( {"one": [1, 1, 1, 2, 2, 2, 3, 3, 3], "two": [6, 7, 8, 6, 7, 8, 6, 7, 8]} ) df = df.replace(reps).copy() df1 = df.copy() df2 = df.sample(frac=1, random_state=1).replace(reps).reset_index(drop=True)

Problem

Sort df1 so that it is in the same order as df1.

df2:

 one two 0 cat grape 1 dog grape 2 cat orange 3 cat apple 4 dog apple 5 dog orange 6 ant apple 7 ant orange 8 ant grape

df1

 one two 0 dog orange 1 dog apple 2 dog grape 3 ant orange 4 ant apple 5 ant grape 6 cat orange 7 cat apple 8 cat grape

Conditions

You can not use df1 as part of a solution, or df, the data is df2, and it needs to be sorted into the ordering of df1.

Attempt

I have tried with pd.Categorical, but couldn't make something work.

order_one = ["dog", "ant", "cat"] order_two = ["orange", "apple", "grape"] df2 = ( df2.groupby(["two"]) .apply(lambda a: a.iloc[pd.Categorical(a["one"], order_one).argsort()]) .reset_index(drop=True) ) df2 = ( df2.groupby(["one"]) .apply(lambda a: a.iloc[pd.Categorical(a["two"], order_two).argsort()]) .reset_index(drop=True) )

Edit

The solution should be based purely on df2, df1 is just part of the test data and to demonstrate how df2 should be sorted. A solution that uses df1 is not viable as this is the result of sorting df2, I can't use that as part of a solution

BENY · Accepted Answer · 2020-05-31 01:30:56Z

Let us try pd.Categorical

df2.one=pd.Categorical(df2.one,categories=df1.one.unique()) df2.two=pd.Categorical(df2.two,categories=df1.two.unique()) df2=df2.sort_values(['one','two']) df2 one two 5 dog orange 4 dog apple 1 dog grape 7 ant orange 6 ant apple 8 ant grape 2 cat orange 3 cat apple 0 cat grape

Make it into function

def yourfunc(x,y): ... for c in x.columns : ... x[c]=pd.Categorical(x[c],categories=y[c].unique()) ... return x.sort_values(x.columns.tolist()) ... yourfunc(df1,df2) one two 8 cat grape 6 cat orange 7 cat apple 2 dog grape 0 dog orange 1 dog apple 5 ant grape 3 ant orange 4 ant apple

Update

order_fruit = ["orange", "apple", "grape"] order_animals = ["dog", "ant", "cat"] def yourfunc(x,y): ... for c, self in zip(x.columns,y) : ... x[c]=pd.Categorical(x[c],categories=self) ... return x.sort_values(x.columns.tolist()) ... yourfunc(df2,[order_animals,order_fruit]) one two 5 dog orange 4 dog apple 1 dog grape 7 ant orange 6 ant apple 8 ant grape 2 cat orange 3 cat apple 0 cat grape

Sorry - df1 here was purely test data, not to be seen. The data as seen would just be that of df2
I'm not sure what you mean by input df, here it should be assumed that df2 is the data received, and that I want to manipulate this to the ordering of df1, I can't use df1 to do that, as it's the result that I want. I do know the ordering that i want though (of course). I don't follow what you're saying with input df though
no, df2 is the only data that you can use, df, df1 are both just tests. I thought this was a lot more obvious than it seems to be so I will update the post, sorry

E.J. White · Accepted Answer · 2020-05-31 03:42:17Z

Technically you are not sorting, since the order is not ascending or descending or alphabetical. You want to order df_2 with some user-specified ordering. You can do this by generating a numerical index based on your custom order, then sorting by that.

order_one = ["dog", "ant", "cat"] order_two = ["orange", "apple", "grape"] # Create dictionaries that define order order_map_one = dict(zip(order_one, range(len(order_one)))) order_map_two = dict(zip(order_two, range(len(order_two)))) # Generate a temp column that maps numerical rank onto column values df_2['order_one_rank'] = df['one'].map(order_map_one) df_2['order_two_rank'] = df['two'].map(order_map_two) # Sort by these temp columns df_2.sort_values(['order_one_rank', 'order_two_rank'], inplace=True) # Then delete the temp columns to recover the original df_2 df_2.drop('order_one_rank', 1, inplace=True) df_2.drop('order_two_rank', 1, inplace=True)

That should leave df_2 in the order you are looking for.

Collectives™ on Stack Overflow

How to sort a pandas dataframe on two (or more) different columns, in a particular order

Note

Setup test data

Problem

Conditions

Attempt

Edit

2 Answers 2

17 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

Note

Setup test data

Problem

Conditions

Attempt

Edit

2 Answers 2

17 Comments

Comments

Related