1

I have following code:

import seaborn as sns import pandas as pd import os sns.set_theme(style="whitegrid") df = pd.read_csv("C:/tmp/all.csv") sns.boxplot(x="cluster", y="val", hue="type", palette=["m", "g"], data=df) sns.despine(offset=10, trim=True) 

My CSV is:

index, cluster, type, val 1, 0-10, male, 1 2, 30-40, female, 5 3, 30-40, male, 3 4, 50-60, male, 7 5, 50-60, female, 1 ... 

The max value of val is 10.

My output is:

enter image description here

But what I want is: o boxplot of values in a grouped way. In my output I'm getting the number of counts for each cluster. The maximum val is actually 10. What am I doing wrong?

7
  • 1
    Have you confirmed the values after pd.read? Try running a df.describe(include=“all”) Commented Oct 17, 2021 at 20:19
  • Can you add the output of df.tail() to verify that the dataframe really is as described? Commented Oct 17, 2021 at 21:30
  • I checked again and can really confirm, that the dataframe is as described above. Commented Oct 18, 2021 at 8:10
  • 1
    @CenkTen According to this information, there is at least one value above 2000. See max of val. what you can do is limit the scope of y using plt.ylim([-1,11]), but this would only hide the outliers outside of the drawing area Commented Oct 18, 2021 at 8:15
  • 1
    The boxplot marks anything outside the whiskers as outliers. See link Commented Oct 18, 2021 at 8:31

1 Answer 1

1

It seems that there are significant outliers beyond your described upper limit of 10. This can be seen visually in the figure, as well as in the table submitted in comments.

limit y scope - quick and dirty approach #1
Set the limit y manually like so:

import matplotlib.pyplot as plt plt.ylim([-1,11]) 

In your code:

import matplotlib.pyplot as plt # <--- import here import seaborn as sns import pandas as pd import os sns.set_theme(style="whitegrid") df = pd.read_csv("C:/tmp/all.csv") sns.boxplot(x="cluster", y="val", hue="type", palette=["m", "g"], data=df) sns.despine(offset=10, trim=True) plt.ylim([-1,11]) # <--- limit y here 

Not showing outliers - quick and dirty approach #2
Change setting of sns.boxplot()

sns.boxplot(showfliers = False) 

Both approaches would concentrate the graph on the inter quartile information. I would prefer approach #1 since it does not remove the outliers, but #2 does not need manual configuration.

Sign up to request clarification or add additional context in comments.

2 Comments

sns.boxplot(showfliers = False) works but without outliers. How about plt? I don't have a plt object?
Sorry but I still dont understand how should I put your first solution in my case

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.