1

I have the following Pandas DataFrame which I use comparing the performance of different classifiers over multiple iterations. After each iteration, I save the ranking of that specific classifier to a DataFrame which is the cumulative sum of rankings over all iterations (the index of the DataFrame tells the ranking from 0-3, i.e., 4 classifiers in total and 0 is the best).

The DataFrame looks as follows:

rankings = {'Classifier1': ['1', '2', '1', '0'], 'Classifier2': ['2', '1', '1', '0'], 'Classifier3': ['0', '1', '1', '2'], 'Classifier4': ['1', '0', '1', '2']} df = pd.DataFrame(data = rankings) 

which formats as:

 Classifier1 Classifier2 Classifier3 Classifier4 0 1 2 0 1 1 2 1 1 0 2 1 1 1 1 3 0 0 2 2 

I would like to create the following boxplot (as in this paper) of the different classifier by using Seaborn or alternative method:

enter image description here

1 Answer 1

1

Firstly, we need to convert your data into numeric values rather than strings. Then, we melt the dataframe to get it into long format, and finally we apply a boxplot with a swarmplot on top

df = df.apply(pd.to_numeric).melt(var_name='Classifier', value_name='AUC Rank') ax = sns.boxplot(data=df, x='Classifier', y='AUC Rank') ax = sns.swarmplot(data=df, x='Classifier', y='AUC Rank', color='black') 

Box plot with black points

Sign up to request clarification or add additional context in comments.

5 Comments

Hi, thanks for your response. Something does not seem correct though. AUC ranks are all at 1 although they should be between 0-3? For example, Classifier 1 should have one point at 0, two points at 1, one point at 2 and zero points at 3.
@wieus That is exactly what the plot is showing, right? Classifier 1 has four data points (0,1,1,2) and they are all shown as black dots in the plot. Note that there is no way of knowing that 3 is a possible outcome because it does not appear in the data. Hence the axis is scaled only up to 2.
Classifier 4 should have one data point at 0, zero at 1, one at 2 and two at 3, but I don't see that in the graph?
@wieus I misunderstood what your dataframe was showing. Do you have the data at an earlier stage, i.e. the point at which it shows the results for each test in turn? Translating the frequencies of each position into points for the box plot is a bit tricky
The process of creating rankings for classifiers is shown here: stackoverflow.com/questions/54365492/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.