1

I have a dataframe with 150 columns and 800 rows. Each row represents a sample, which belongs to one of 5 classes. Therefore all samples are pre-classified. I need to create 150 boxplot charts, one for each column (variable), showing the distribution of the data between the classes, for that variable.

I managed to build a code to generate the graphs, but I have to adjust by hand each of the 150 lines to indicate the location of the graph, which is a sequence [0,0], [0,1], [0,2], [1,0], [1,1], [1,2] etc., as well as the y, which could come from a list, but I don't know how to do this.

Below is an example of how it looks like. The first 9 I did by hand, but to do the other 150 would be a lot of work. It should be possible to automate this, I think, but I don't know how. Does anyone have an idea?

fig, axes = plt.subplots(3, 3, figsize=(18, 12)) fig.suptitle('SAPIENS BOXPLOTS') sns.boxplot(ax=axes[0, 0], data=sapiens, x='classe', y='meanB0') sns.boxplot(ax=axes[0, 1], data=sapiens, x='classe', y='meanB1') sns.boxplot(ax=axes[0, 2], data=sapiens, x='classe', y='meanB2') sns.boxplot(ax=axes[1, 0], data=sapiens, x='classe', y='meanB3') sns.boxplot(ax=axes[1, 1], data=sapiens, x='classe', y='meanB4') sns.boxplot(ax=axes[1, 2], data=sapiens, x='classe', y='varB0') sns.boxplot(ax=axes[2, 0], data=sapiens, x='classe', y='varB1') sns.boxplot(ax=axes[2, 1], data=sapiens, x='classe', y='varB2') sns.boxplot(ax=axes[2, 2], data=sapiens, x='classe', y='varB3') 

BOXPLOTS_SAPIENS

0

3 Answers 3

5

Imports & Test DataFrame

import pandas as pd import seaborn as sns import numpy as np # for sample data # set seed for reproducibility np.random.seed(1) # create arrays of random sample data cl = np.random.choice(range(1, 6), size=(100, 1)) d = np.random.random_sample(size=(100, 6)) # combine the two arrays data = np.concatenate([cl, d], axis=1) # create a dataframe sapiens = pd.DataFrame(data, columns=['classe', 'mB0', 'mB1', 'mB2', 'vB0', 'vB1', 'vB2']) classe mB0 mB1 mB2 vB0 vB1 vB2 0 4.0 0.647749 0.353939 0.763233 0.356532 0.752788 0.881342 1 5.0 0.011669 0.498109 0.073792 0.786951 0.064067 0.355310 2 1.0 0.941837 0.379803 0.762920 0.771595 0.301360 0.772739 

Melt and Plot

  • If there’re extra columns that don't need to be plotted, some options are:
  • For data that needs to be scaled differently, use the sharey=False parameter
    • sns.catplot(..., sharey=False)
    • However, the issue with this is that it visually obfuscates the difference between the different distributions.
      • Alternatively, try p.set(yscale='log') or p.set(yscale='symlog'), the line creating the plot.
  • p.set_xticklabels(visible=True) should work to show xtick labels on all axes, but it's adding labels to the top and bottom, so an alternate option is provided below in the code.
# convert from wide format to tidy format sm = sapiens.melt(id_vars='classe') classe variable value 0 4.0 mB0 0.647749 1 5.0 mB0 0.011669 2 1.0 mB0 0.941837 3 2.0 mB0 0.152930 4 4.0 mB0 0.467393 # plot p = sns.catplot(kind='box', data=sm, x='classe', y='value', col='variable', col_wrap=3, height=4) # add figure level title p.fig.subplots_adjust(top=0.9) p.fig.suptitle('Sapiens', size=16) # enable tick labels for xticks on all axes for ax in p.axes.flat: ax.tick_params(labelbottom=True) p.tight_layout() 

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Now it worked perfectly with this prameter sharey=False !
0

First, you need to assign columns of sapiens which will be your y for each boxplot. Assuming that your first column is classe and you want to plot every column after that column, this is how you do it:

# get y values y_labels = sapiens.columns[1:] 

Next, decide on figsize, nrows, and ncols for plt.figsize. And finally start drawing using a loop.

import math # calculate figure size ncols = 3 nrows = math.ceil(len(y_labels) / 3) figsize = (ncols * 6, nrows * 4) # assign fig and axes fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize) fig.suptitle('SAPIENS BOXPLOTS') # set y_labels index y_idx = 0 # drawing plots for axs in axes: for ax in axs: sns.boxplot(ax=ax, data=sapiens, x='classe', y=y_labels[y_idx]) ## update y_idx y_idx += 1 

Comments

0

You can use a loop and use divmod to determine the axes:

for i, y in enumerate(y_labels): sns.boxplot(ax=axes[divmod(i, n_cols)], data=sapiens, x='classe', y=y) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.