0

I want to create multiple (two in this case) boxplots based on a data in a dataframe

I have the following dataframe:

 Country Fund R^2 Style 0 Austria BG EMCore Convertibles Global CHF R T 0.739131 Allocation 1 Austria BG EMCore Convertibles Global R T 0.740917 Allocation 2 Austria BG Trend A T 0.738376 Fixed Income 3 Austria Banken Euro Bond-Mix A 0.71161 Fixed Income 4 Austria Banken KMU-Fonds T 0.778276 Allocation 5 Brazil Banken Nachhaltigkeitsfonds T 0.912808 Allocation 6 Brazil Banken Portfolio-Mix A 0.857019 Allocation 7 Brazil Banken Portfolio-Mix T 0.868856 Fixed Income 8 Brazil Banken Sachwerte-Fonds T 0.730626 Fixed Income 9 Brazil Banken Strategie Wachstum T 0.918684 Fixed Income 

I want to create a boxplot chart for each country summarized by Style and showing the distribution of R^2. I was thinking of groupby operation but somehow I don't manage to make two charts for each country.

Thanks in advance

2
  • how the data shell by grouped? Only by Country or by Country and Style? Commented Aug 13, 2019 at 10:36
  • I guess by country and style. For each country one boxplot chart consisting of two bars for style - because we have Allocation and Fixed Income. Hope this answers Commented Aug 13, 2019 at 10:40

3 Answers 3

2

Here You go. Description in code.

=^..^=

import pandas as pd import numpy as np import matplotlib.pyplot as plt from io import StringIO data = StringIO(""" Country R^2 Style Austria 0.739131 Allocation Austria 0.740917 Allocation Austria 0.738376 Fixed_Income Austria 0.71161 Fixed_Income Austria 0.778276 Allocation Brazil 0.912808 Allocation Brazil 0.857019 Allocation Brazil 0.868856 New_Style Brazil 0.730626 Fixed_Income Brazil 0.918684 Fixed_Income Brazil 0.618684 New_Style """) # load data into data frame df = pd.read_csv(data, sep=' ') # group data by Country grouped_data = df.groupby(['Country']) # create list of grouped data frames df_list = [] country_list = [] for item in list(grouped_data): df_list.append(item[1]) country_list.append(item[0]) # plot box for each Country for df in df_list: country = df['Country'].unique() df = df.drop(['Country'], axis=1) df = df[['Style', 'R^2']] columns_names = list(set(df['Style'])) # pivot rows into columns df = df.assign(g = df.groupby('Style').cumcount()).pivot('g','Style','R^2') # plot box df.boxplot(column=colums_names) plt.title(country[0]) plt.show() 

Output:

enter image description here enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Zaraki. I think it will work. I didn't specify this but there are many countries and multiple styles - for example equity, fixed income, allocation etc. Therefore in df.boxplot(column=['Allocation', 'Fixed_Income', Equity', etc etc]) I input all the styles. Yet for some countries not all styles apply. So when the code finds a country with less styles than the specified in the list it gives an error. Do you know how I can tackle this. Maybe somehow to specify in df.boxplot(column=['Allocation', 'Fixed_Income' etc.]) that if some of the styles is not found to raise exception.
@MartinYordanovGeorgiev I updated my code with line: column_names. Now it should handle different styles.
Thanks Zaraki, works just fine. Much appreciated. Just corrected one typo in df.boxplot(column=colums_names) "n" is omitted from colums_names. I posted an alternative answer myself. You can check below if interested.
1

Came up with some solution myself.

df= "This is the table from the original question" uniquenames=df.Country.unique() # create dictionary of the data with countries set as keys diction={elem:pd.DataFrame for elem in uniquenames} # fill dictionary with values for key in diction.keys(): diction[key]=df[:][df.Country==key] #plot the data for i in diction.keys(): diction[i].boxplot(column="R^2",by="Style", figsize=(15,6),patch_artist=True,fontsize=12) plt.xticks(rotation=90) plt.title(i,fontsize=12) 

Comments

0

Use seaborn for this kind of tasks. Here are a couple of options:

Use seaborn's boxplot

import seaborn as sns sns.set() # Note - the data is stored in a data frame df sns.boxplot(x='Country', y='R^2', hue='Style', data=df) 

enter image description here

Alternatively, you can use seaborn's FacetGrid.

g = sns.FacetGrid(df, col="Country", row="Style") g = g.map(sns.boxplot, 'R^2', orient='v') 

enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.