0

I have sparse scatter plot to visualize the comparison of predicted vs actual values. The range of the values are 1-4 and there are no decimal points.

I have tried plotly so far with hte following code (but I can also use a matplotlib solution):

my_scatter = go.Scatter( x = y_actual, y = y_pred, mode = 'markers', marker = dict(color = 'rgb(240, 189, 89)', opacity=0.5) ) 

This prints the graph nicely (see below). I use opacity to see the density at each point. I.e. if two points lie on top of each other, the point will be shown in darker color. However, this is not explanatory enough. Is it possible to add the counts at each point as a label? There are some overlaps at certain intersections. I want to display how many points intersects. Can this be done automatically using matplotlib or plotly?

enter image description here

5
  • @ImportanceOfBeingErnest I am sorry for this. I hope it looks better! (could be because of my poor english!) Commented Apr 21, 2017 at 10:00
  • 1
    In matplotlib there is no automatic way to do what you want. (I don't know about plotly, though). You may need to find out which points overlap, possibly by using a numpy histogram2d or a pandas pivot table. Then you could annotate the points (e.g using matplotlib.text). Commented Apr 21, 2017 at 12:38
  • @ImportanceOfBeingErnest do you have any recommendation to represent the data using a different plot? Commented Apr 21, 2017 at 13:53
  • I think the plot itself is fine, although I would probably choose a colormap to represent frequency instead of opacity alone. You may also vary the size of the points (the more points, the larger the dot). You may also use a hexbin plot. If you can provide some data to play with, I could surely provide an answer. Commented Apr 21, 2017 at 14:00
  • Thanks @ImportanceOfBeingErnest. My data for the plot is pretty straigthforward (just a list of integers ranging between 0-4. For example: y_actual: [3, 0, 1, 2, 2, 0, 1, 3, 3, 3, 4, 1, 4, 3, 0] and y_predict: [1, 0, 4, 3, 2, 1, 4, 0, 3, 0, 4, 2, 3, 3, 1] Commented Apr 21, 2017 at 14:09

1 Answer 1

2

This answer uses matplotlib.

To answer the initial question first: You need to find out how often the data produces a point at a given coordinate to be able to annotate the points. If all values are integers this can easily be done using a 2d histogram. Out of the hstogram one would then select only those bins where the count value is nonzero and annotate the respective values in a loop:

x = [3, 0, 1, 2, 2, 0, 1, 3, 3, 3, 4, 1, 4, 3, 0] y = [1, 0, 4, 3, 2, 1, 4, 0, 3, 0, 4, 2, 3, 3, 1] import matplotlib.pyplot as plt import numpy as np x = np.array(x) y = np.array(y) hist, xbins,ybins = np.histogram2d(y,x, bins=range(6)) X,Y = np.meshgrid(xbins[:-1], ybins[:-1]) X = X[hist != 0]; Y = Y[hist != 0] Z = hist[hist != 0] fig, ax = plt.subplots() ax.scatter(x,y, s=49, alpha=0.4) for i in range(len(Z)): ax.annotate(str(int(Z[i])), xy=(X[i],Y[i]), xytext=(4,0), textcoords="offset points" ) plt.show() 

enter image description here

You may then decide not to plot all points but the result from the histogramming which offers the chance to change the color and size of the scatter points,

ax.scatter(X,Y, s=(Z*20)**1.4, c = Z/Z.max(), cmap="winter_r", alpha=0.4) 

enter image description here

Since all values are integers, you may also opt for an image plot,

fig, ax = plt.subplots() ax.imshow(hist, cmap="PuRd") for i in range(len(Z)): ax.annotate(str(int(Z[i])), xy=(X[i],Y[i]), xytext=(0,0), color="w", ha="center", va="center", textcoords="offset points" ) 

enter image description here

Without the necesity to calculate the number of occurances, another option is to use a hexbin plot. This gives slightly inaccurate positions of the dots, du to the hexagonal binning, but I still wanted to mention this option.

import matplotlib.pyplot as plt import matplotlib.colors import numpy as np x = np.array(x) y = np.array(y) fig, ax = plt.subplots() cmap = plt.cm.PuRd cmaplist = [cmap(i) for i in range(cmap.N)] cmaplist[0] = (1.0,1.0,1.0,1.0) cmap = matplotlib.colors.LinearSegmentedColormap.from_list('mcm',cmaplist, cmap.N) ax.hexbin(x,y, gridsize=20, cmap=cmap, linewidth=0 ) plt.show() 

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.