0

I am trying to create a boxplot, where on the x-axis I will refer to the two columns of the dataframe, while on y-axis I will show values of the 3rd column.

Let me refer to an example dataframe:

 Lvl1 Lvl2 value 0 A 1 1 1 A 2 2 2 A 1 3 3 B 2 4 4 B 1 5 5 B 2 6 

Now, I want to have boxplots for the groups corresponding to Lvl1 and Lvl2. For example for group represented by (Lvl1 = A, Lvl2 = 1) boxplot would be calculated of values of {1,3}.

I know I can create a new column like Lvl0 which would be something like Lvl1 + Lvl2, but is there a way to create a boxplot without such operation?

On the following code:

import pandas as pd import matplotlib.pyplot as plt dataset = pd.DataFrame( {'Lvl1': ['A', 'A', 'A', 'B', 'B', 'B'], 'Lvl2': [1, 2, 1, 2, 1, 2], 'value': [1, 2, 3, 4, 5, 6]}) grouped = dataset.groupby(['Lvl1', 'Lvl2']) grouped.boxplot() plt.show() 

I get an error:
KeyError: "None of [Index(['A', 1], dtype='object')] are in the [index]"

Thank you in advance!

2 Answers 2

1

Try to use seaborn for an easier solution. I think it was answered here: Grouping boxplots in seaborn when input is a DataFrame

With your data:

import seaborn as sns import pandas as pd data = pd.DataFrame({'lvl1': ['A', 'A', 'A', 'B', 'B', 'B'], 'lvl2': [1, 2, 1, 2, 1, 2], 'value': [1, 2, 3, 4, 5, 6]}) df_long = pd.melt(data, "lvl1", var_name="lvl2", value_name="result") sns.boxplot(x="lvl1", hue="lvl2", y="result", data=df_long) 

We get:

enter image description here

If you need more levels, try to combine plots with sns.FacetGrid (https://seaborn.pydata.org/generated/seaborn.FacetGrid.html). Here I propose using sns.catplot:

data = pd.DataFrame({'lvl1': ['A', 'A', 'A', 'B', 'B', 'B', 'A', 'B'], 'group': ['1', '2', '1', '2', '1', '2', '2', '1'], 'has_something': [True, False, False, True, True, False, True, False], 'before': [3, 4, 5, 5, 3, 4, 2, 6], 'after': [1, 2, 3, 4, 5, 6, 2, 3], 'baseline': [1, 0, 0, 1, 1, 0, 0, 1]}) df = pd.melt(data, ["lvl1", 'group', 'has_something'], value_name="result") sns.catplot(data=df, x='lvl1', y='result', col='group', kind='box', hue='variable', col_wrap=2, margin_titles=True) 

And result here:

enter image description here

In order to include 'has_something' variable to the plot, you can use FacetGrid or separate data by 'has_something' and make two plots with different filtered data.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. How could I extend it if there would be more then 2 levels? I don't see exactly how melt could help us here.
I added part about more levels to the answer. Hope it would help.
0

You can do it through seaborn. Following code works for me on your data:

import pandas as pd import seaborn as sns dataset = pd.DataFrame( { 'Lvl1': ['A', 'A', 'A', 'B', 'B', 'B'], 'Lvl2': [1, 2, 1, 2, 1, 2], 'value': [1, 2, 3, 4, 5, 6] } ) ax = sns.boxplot(x='Lvl1', y='value', hue="Lvl2", data=dataset) 

Expired output:

1 Comment

Thank you. Is there an extension for the case where there are more then 2 columns that we take into account? Like let's say 3 levels or more?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.