0

I've got a bunch of plots to make with large numbers of points in them. When I try to do it with matplotlib it takes hours, which isn't convenient. What alternative approaches exist?

The relevant bit of my code is as follows, where the number of points for each feature could easily be 100,000:

marker = 'o' s = 10 patches = [] import matplotlib.patches as mpatches for feature, color in zip(features, colors): for point, value in zip(tsne, df[feature].values): try: plt.scatter(point[0], point[1], alpha=value, facecolor=color, marker=marker, s=s, label=feature) except: pass patches.append(mpatches.Rectangle((0, 0), 1, 1, fc=color)) plt.legend(patches, features, prop={'size': 15}, loc='center left', bbox_to_anchor=(1, 0.5)) plt.show(); 
1
  • 2
    You could speed things up a lot by not plotting each point individually. Scatter accepts arrays, including for color (which can be specified by rgba Commented Oct 24, 2019 at 15:31

1 Answer 1

1

Running your inner loop:

for point, value in zip(tsne, df[feature].values): try: plt.scatter(point[0], point[1], alpha=value, facecolor=color, marker=marker, s=s, label=feature) 

instead with 1d numpy arrays will definitely speed things up.

The inner loop could be replaced with something like:

x = tsne[:, 0] # is `tsne` an (n, 2) numpy array? y = tsne[:, 1] alpha_values = df[feature].values try: plt.scatter(x, y, alpha=alpha_values, facecolor=color, marker=marker, s=s, label=feature) except: pass 

If things are still too slow for you, you could also switch over to datashading in Holoviews, but try removing the inner for loop first, since that is definitely slowing you down a lot.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.