0

I am plotting two different box plots with pandas with this:

plt.figure() df['mean_train_score_error'] = [1] - df['mean_train_score'] df.boxplot(column=['mean_train_score_error'], by='modelo', medianprops = medianprops, autorange=True,showfliers=False, patch_artist=True, vert=True, showmeans=True,meanline=True) plt.ylabel('Error: 1-F1 Score') plt.title('Error de entrenamiento') plt.suptitle('') df['mean_test_score_error'] = [1] - df['mean_test_score'] df.boxplot(column=['mean_test_score_error'], by='modelo', medianprops = medianprops, autorange=True,showfliers=False, patch_artist=True, vert=True, showmeans=True,meanline=True) plt.ylabel('Error: 1-F1 Score') plt.title('Error de validación') plt.suptitle('') 

And I am getting the following two plots:

enter image description here

enter image description here

The question is if is possible plot the 6 boxplot on the same plot and to use different color for the each three boxplot of the each plot?

1 Answer 1

1
  • The easiest way to do this is transform the data from a wide to long format, and then plot with seaborn, using the hue parameter.
  • pandas.wide_to_long
    • There must be a unique id, hence adding the id column.
    • The columns being transformed, must have similar stubnames, which is why I moved error to the front of the column name.
      • The error column names will be in one column and the value in a separate column

Imports and Test Data

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns # setup data and dataframe np.random.seed(365) data = {'mod_lg': np.random.normal(0.3, .1, size=(30,)), 'mod_rf': np.random.normal(0.05, .01, size=(30,)), 'mod_bg': np.random.normal(0.02, 0.002, size=(30,)), 'mean_train_score': np.random.normal(0.95, 0.3, size=(30,)), 'mean_test_score': np.random.normal(0.86, 0.5, size=(30,))} df = pd.DataFrame(data) df['error_mean_test_score'] = [1] - df['mean_test_score'] df['error_mean_train_score'] = [1] - df['mean_train_score'] df["id"] = df.index df = pd.wide_to_long(df, stubnames='mod', i='id', j='mode', sep='_', suffix='\D+').reset_index() df["id"] = df.index # display dataframe: this is probably what your dataframe looks like to generate your current plots id mode mean_train_score error_mean_test_score mean_test_score error_mean_train_score mod 0 0 lg 0.663855 -0.343961 1.343961 0.336145 0.316792 1 1 lg 0.990114 0.472847 0.527153 0.009886 0.352351 2 2 lg 1.179775 0.324748 0.675252 -0.179775 0.381738 3 3 lg 0.693155 0.519526 0.480474 0.306845 0.470385 4 4 lg 1.191048 -0.128033 1.128033 -0.191048 0.085305 

Transform and plot

  • The error_score_name column contains the suffix from error_mean_test_score & error_mean_train_score
  • The error_score_value column contains the values.
# convert df error columns to long format dfl = pd.wide_to_long(df, stubnames='error', i='id', j='score', sep='_', suffix='\D+').reset_index(level=1) dfl.rename(columns={'score': 'error_score_name', 'error': 'error_score_value'}, inplace=True) # display dfl error_score_name mean_train_score mod mean_test_score mode error_score_value id 0 mean_test_score 0.663855 0.316792 1.343961 lg -0.343961 1 mean_test_score 0.990114 0.352351 0.527153 lg 0.472847 2 mean_test_score 1.179775 0.381738 0.675252 lg 0.324748 3 mean_test_score 0.693155 0.470385 0.480474 lg 0.519526 4 mean_test_score 1.191048 0.085305 1.128033 lg -0.128033 # plot dfl sns.boxplot(x='mode', y='error_score_value', data=dfl, hue='error_score_name') 

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.