2

I have a dataset with multiple categories and I want to plot in a single figure to see how something changes. I have a list of given categories in the data set that I'm would like to see it all plot in the same figure

sample = [ ['For business', 0.7616104043587437], ['For home and cottages', 0.6890139579274699], ['Consumer electronics', 0.039868871866136635], ['Personal things', 0.7487893699793786], ['Services', 0.747226678171249], ['Services', 0.23463661173977313], ['Animals', 0.6504301798258314], ['For home and cottages', 0.49567857024037665], ['For home and cottages', 0.9852681814098107], ['Transportation', 0.8134867587477912], ['Animals', 0.49988690699674654], ['Consumer electronics', 0.15086800344617235], ['For business', 0.9485494576819328], ['Hobbies and Leisure', 0.25766871111905243], ['For home and cottages', 0.31704508627659533], ['Animals', 0.6192114570078333], ['Personal things', 0.5755788287287359], ['Hobbies and Leisure', 0.10106922056341394], ['Animals', 0.16834618003738577], ['Consumer electronics', 0.7570803588496894] ] train = pd.DataFrame(data=sample, columns=['parent_category_name','deal_probability']) parent_categories = train['parent_category_name'].unique() parent_categories_size = len(parent_categories) fig, ax = plt.subplots(figsize=(12,10)) colors = iter(cm.rainbow(np.linspace(0, 1, parent_categories_size))) for parent_category_n in range(parent_categories_size): parent_1 = train[train['parent_category_name'] == parent_categories[parent_category_name]] ax.scatter( range(parent_1.shape[0]), np.sort(parent_1.deal_probability.values), color = next(colors) ) plt.ylabel('likelihood that an ad actually sold something', fontsize=12) plt.title('Distribution of likelihood that an ad actually sold something') 

I've no idea why I can only see the last plot instead of all of them. Alternatively I could work with having multiple scatter plots in one figure, but I'm having a hard time trying to plot this.

Currently I'm working with 10 categories but I'm trying to make it dynamic.

5
  • I've tried to use something similar to what is asked here (stackoverflow.com/questions/48380953/…) but I'm only getting the last plot. Commented May 26, 2018 at 8:42
  • I'm using matplolib 2.1.2 Commented Jun 2, 2018 at 7:49
  • Sorry for that, editing the question. Commented Jun 4, 2018 at 7:47
  • Now, I can reproduce an output, but I can't reproduce your problem. The diagram displays all points. Several questions though 1) Do you mean ylabel instead of xlabel? This like plt.title doesn't need to be within the loop, because you only have to set it once. 2) Why do you retrieve first parent_categories from your dataframe and overwrite it then with a predefined list? 3) Your code does not use the categorical data, instead plots the probabilities in ascending order against the position number within the category. Is this the intention? Commented Jun 4, 2018 at 8:19
  • Thanks for following up on this! I'm trying to plot multiple figures (one for each category_name) so I can see if the likelihood of deal_probability grows higher for some of them. I have 10 category_names and I want to plot one graph for each of them. 1) You're right, fixed in my code and edited the question here. Also removed from the loop. 2) Another mistake I've introduced when trying to make it MCV 3) Yes that's the intention, to plot the probabilities in ascending order for each category in a different figure 2) Commented Jun 4, 2018 at 21:09

1 Answer 1

3

If you want to observe the development over time, a line plot with markers is probably better to visualize the changes in each category:

import pandas as pd from matplotlib import pyplot as plt import matplotlib.cm as cm sample = [ ['For business', 0.7616104043587437], ['For home and cottages', 0.6890139579274699], ['Consumer electronics', 0.039868871866136635], ['Personal things', 0.7487893699793786], ['Services', 0.747226678171249], ['Services', 0.23463661173977313], ['Animals', 0.6504301798258314], ['For home and cottages', 0.49567857024037665], ['For home and cottages', 0.9852681814098107], ['Transportation', 0.8134867587477912], ['Animals', 0.49988690699674654], ['Consumer electronics', 0.15086800344617235], ['For business', 0.9485494576819328], ['Hobbies and Leisure', 0.25766871111905243], ['For home and cottages', 0.31704508627659533], ['Animals', 0.6192114570078333], ['Personal things', 0.5755788287287359], ['Hobbies and Leisure', 0.10106922056341394], ['Animals', 0.16834618003738577], ['Consumer electronics', 0.7570803588496894] ] train = pd.DataFrame(data=sample, columns=['parent_category_name','deal_probability']) parent_categories = train['parent_category_name'].unique() fig, ax = plt.subplots(figsize=(10,8)) colors = iter(cm.rainbow(np.linspace(0, 1, len(parent_categories)))) for parent_category in parent_categories: ax.plot(range(len(train[train["parent_category_name"] == parent_category])), sorted(train[train["parent_category_name"] == parent_category].deal_probability.values), color = next(colors), marker = "o", label = parent_category) plt.ylabel('likelihood that an ad actually sold something', fontsize=12) plt.title('Distribution of likelihood that an ad actually sold something') plt.legend(loc = "best") plt.show() 

Output:

enter image description here

But since this is an arbitrary scale and you sort the data, in my opinion you can even better see the spread in a categorical plot:

train = pd.DataFrame(data=sample, columns=['parent_category_name','deal_probability']) parent_categories = train['parent_category_name'].unique() fig, ax = plt.subplots(figsize=(18,9)) colors = iter(cm.rainbow(np.linspace(0, 1, len(parent_categories)))) for parent_category in parent_categories: ax.scatter( train[train["parent_category_name"] == parent_category].parent_category_name.values, train[train["parent_category_name"] == parent_category].deal_probability.values, color = next(colors), label = parent_category ) plt.ylabel('likelihood that an ad actually sold something', fontsize=12) plt.title('Distribution of likelihood that an ad actually sold something') plt.legend(loc = "best") plt.show() 

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot for such a detailed response! This gives me a lot to learn and look at!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.