0

I have a dataset where I would like to sort my data within categories in a specific way.

Data

Location range type Q1 27 Q2 27 Q3 27 Q4 27 NY low re AA 5 0 7 0 NY low re BB 0 0 0 0 NY low re DD 0.51 1 2 2.05 NY low re SS 0 0 0 0 NY low stat AA 1 0.86 1.5 5.27 NY low stat BB 0.45 0.49 0.91 2.17 NY low stat DD 1.53 1.27 2.22 7.78 NY low stat SS 0.26 0.21 0.37 1.3 CA med stat AA 0.38 3 0 0 CA med stat BB 0.33 1.22 0 0.11 CA med stat DD 0.55 5.4 0 0 CA med stat SS 0.09 0.91 0 0 CA med re AA 1.14 0 0 0 CA med re BB 0 0 0 2 CA med re DD 1.02 0 0 0.56 CA med re SS 0 0 0 0 import pandas as pd data = { 'Location': ['NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA'], 'range': ['low re', 'low re', 'low re', 'low re', 'low stat', 'low stat', 'low stat', 'low stat', 'med stat', 'med stat', 'med stat', 'med stat', 'med re', 'med re', 'med re', 'med re'], 'type': ['AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS'], 'Q1 27': [5, 0, 0.51, 0, 1, 0.45, 1.53, 0.26, 0.38, 0.33, 0.55, 0.09, 1.14, 0, 1.02, 0], 'Q2 27': [0, 0, 1, 0, 0.86, 0.49, 1.27, 0.21, 3, 1.22, 5.4, 0.91, 0, 0, 0, 0], 'Q3 27': [7, 0, 2, 0, 1.5, 0.91, 2.22, 0.37, 0, 0, 0, 0, 0, 0, 0, 0], 'Q4 27': [0, 0, 2.05, 0, 5.27, 2.17, 7.78, 1.3, 0, 0.11, 0, 0, 0, 2, 0.56, 0] } df = pd.DataFrame(data) print(df) 

Desired

All types are similar - ex AA belongs w AA BB belongs with BB categorized by Location. Order doesn't matter as long as the types are adjacent to each other categorized by Location. Location range type Q1 27 Q2 27 Q3 27 Q4 27 NY low re AA 5 0 7 0 NY low stat AA 1 0.86 1.5 5.27 NY low re DD 0.51 1 2 2.05 NY low stat DD 1.53 1.27 2.22 7.78 NY low re SS 0 0 0 0 NY low stat SS 0.26 0.21 0.37 1.3 NY low re BB 0 0 0 0 NY low stat BB 0.45 0.49 0.91 2.17 CA med re AA 1.14 0 0 0 CA med stat AA 0.38 3 0 0 CA med re DD 1.02 0 0 0.56 CA med stat DD 0.55 5.4 0 0 CA med re SS 0 0 0 0 CA med stat SS 0.09 0.91 0 0 CA med re BB 0 0 0 2 CA med stat BB 0.33 1.22 0 0.11 

Doing

I am trying to implement

 df.sort_values('Location', 'range', inplace=True) 

However, I am not obtaining my desired result. Any suggestion is appreciated.

6
  • 1
    what is the sorting logic? Commented May 1, 2023 at 18:08
  • All types are similar - ex AA belongs w AA BB belongs with BB categorized by Location @mozway Commented May 1, 2023 at 18:14
  • OK, but what determines the order? Is this always AA/DD/SS/BB? what about low/med re/stat? Commented May 1, 2023 at 18:21
  • this can be in any order @mozway, just showing an example Commented May 1, 2023 at 18:22
  • 1
    So, just df.sort_values(['Location', 'type', 'range'])? Commented May 1, 2023 at 18:24

2 Answers 2

1

I think you need to pass a list of columns to sort by in 'sort_values()' plus sort by the 'type' column. Try this:

df.sort_values(['Location', 'type', 'range'], inplace=True) 

This sorts the DataFrame firstly by 'Location' column, then by 'type' column, and finally by 'range' column. Let me know if this works.

Sign up to request clarification or add additional context in comments.

Comments

1

Your implementation is not very good. sort_values should be included as a parameter in by and the value for ascending or descending should be set. Additionally, because the type order is in the form of 'AA' to 'DD' to 'SS' to 'BB', the range follows a certain pattern. You can use pandas.Categorical to achieve your desired output.

You can try this code here to obtain the desire result:

import pandas as pd data = { 'Location': ['NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA'], 'range': ['low re', 'low re', 'low re', 'low re', 'low stat', 'low stat', 'low stat', 'low stat', 'med stat', 'med stat', 'med stat', 'med stat', 'med re', 'med re', 'med re', 'med re'], 'type': ['AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS'], 'Q1 27': [5, 0, 0.51, 0, 1, 0.45, 1.53, 0.26, 0.38, 0.33, 0.55, 0.09, 1.14, 0, 1.02, 0], 'Q2 27': [0, 0, 1, 0, 0.86, 0.49, 1.27, 0.21, 3, 1.22, 5.4, 0.91, 0, 0, 0, 0], 'Q3 27': [7, 0, 2, 0, 1.5, 0.91, 2.22, 0.37, 0, 0, 0, 0, 0, 0, 0, 0], 'Q4 27': [0, 0, 2.05, 0, 5.27, 2.17, 7.78, 1.3, 0, 0.11, 0, 0, 0, 2, 0.56, 0] } df = pd.DataFrame(data) def range_order(val): if val.endswith('re'): return val + 'a' elif val.endswith('stat'): return val + 'b' else: return val # apply the function to the range column df['range'] = df['range'].apply(range_order) range_order = df['range'].unique().tolist() #Get the range_order from lowest to highest range_order.sort() # define the categorical ordering of the range column df['range'] = pd.Categorical(df['range'], categories=range_order, ordered=True) # define the categorical ordering of the type column type_order = ['AA', 'DD', 'SS', 'BB'] df['type'] = pd.Categorical(df['type'], categories=type_order, ordered=True) df = df.sort_values(by=['Location', 'type', 'range'], ascending=[False, True, True]) # remove the extra letter used in sorting range values df['range'] = df['range'].apply(lambda x: x[:-1]) print(df) 

The output look like:

 Location range type Q1 27 Q2 27 Q3 27 Q4 27 0 NY low re AA 5.00 0.00 7.00 0.00 4 NY low stat AA 1.00 0.86 1.50 5.27 2 NY low re DD 0.51 1.00 2.00 2.05 6 NY low stat DD 1.53 1.27 2.22 7.78 3 NY low re SS 0.00 0.00 0.00 0.00 7 NY low stat SS 0.26 0.21 0.37 1.30 1 NY low re BB 0.00 0.00 0.00 0.00 5 NY low stat BB 0.45 0.49 0.91 2.17 12 CA med re AA 1.14 0.00 0.00 0.00 8 CA med stat AA 0.38 3.00 0.00 0.00 14 CA med re DD 1.02 0.00 0.00 0.56 10 CA med stat DD 0.55 5.40 0.00 0.00 15 CA med re SS 0.00 0.00 0.00 0.00 11 CA med stat SS 0.09 0.91 0.00 0.00 13 CA med re BB 0.00 0.00 0.00 2.00 9 CA med stat BB 0.33 1.22 0.00 0.11 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.