Creating boxplot from Pandas DataFrame using Seaborn

Question

I have the following Pandas DataFrame which I use comparing the performance of different classifiers over multiple iterations. After each iteration, I save the ranking of that specific classifier to a DataFrame which is the cumulative sum of rankings over all iterations (the index of the DataFrame tells the ranking from 0-3, i.e., 4 classifiers in total and 0 is the best).

The DataFrame looks as follows:

rankings = {'Classifier1': ['1', '2', '1', '0'], 'Classifier2': ['2', '1', '1', '0'], 'Classifier3': ['0', '1', '1', '2'], 'Classifier4': ['1', '0', '1', '2']} df = pd.DataFrame(data = rankings)

which formats as:

 Classifier1 Classifier2 Classifier3 Classifier4 0 1 2 0 1 1 2 1 1 0 2 1 1 1 1 3 0 0 2 2

I would like to create the following boxplot (as in this paper) of the different classifier by using Seaborn or alternative method:

asongtoruin · Accepted Answer · 2019-01-28 12:37:54Z

1

Firstly, we need to convert your data into numeric values rather than strings. Then, we melt the dataframe to get it into long format, and finally we apply a boxplot with a swarmplot on top

df = df.apply(pd.to_numeric).melt(var_name='Classifier', value_name='AUC Rank') ax = sns.boxplot(data=df, x='Classifier', y='AUC Rank') ax = sns.swarmplot(data=df, x='Classifier', y='AUC Rank', color='black')

answered Jan 28, 2019 at 12:37

asongtoruin

10.4k3 gold badges42 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

wieus Over a year ago

Hi, thanks for your response. Something does not seem correct though. AUC ranks are all at 1 although they should be between 0-3? For example, Classifier 1 should have one point at 0, two points at 1, one point at 2 and zero points at 3.

ImportanceOfBeingErnest Over a year ago

@wieus That is exactly what the plot is showing, right? Classifier 1 has four data points (0,1,1,2) and they are all shown as black dots in the plot. Note that there is no way of knowing that 3 is a possible outcome because it does not appear in the data. Hence the axis is scaled only up to 2.

wieus Over a year ago

Classifier 4 should have one data point at 0, zero at 1, one at 2 and two at 3, but I don't see that in the graph?

asongtoruin Over a year ago

@wieus I misunderstood what your dataframe was showing. Do you have the data at an earlier stage, i.e. the point at which it shows the results for each test in turn? Translating the frequencies of each position into points for the box plot is a bit tricky

wieus Over a year ago

The process of creating rankings for classifiers is shown here: stackoverflow.com/questions/54365492/…

Collectives™ on Stack Overflow

Creating boxplot from Pandas DataFrame using Seaborn

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related