2

When plotting scatter plots in matplotlib and saving to a vector format, in this case PDF, the generated file size is scaling with the number of points.

Since I have lots of points with large amount of overlapping points, I set alpha=.2 to see how densely distributed the points are. In central regions, this results in the displayed color equalling the appearance of alpha=1.

Is there any way to "crop" these regions (f.i. by combining overlapping points within a specified distance) when saving the figure to a vectorized file, so some kind of area is saved instead of saving each single point?

What I forgot to mention: Since I need to plot the correlations of multiple variables, I need a (n x n) scatter plot matrix where n is the number of variables. This impedes the use of hexbin or other methods, since I'd have to create a full grid of plots by myself.

For example as in:

fig_sc = plt.figure(figsize=(5, 5)) ax_sc = fig_sc.gca() ax_sc.scatter( np.random.normal(size=100000), np.random.normal(size=100000), s=10, marker='o', facecolors='none', edgecolors='black', alpha=.3) fig_sc.savefig('test.pdf', format='pdf') 

This results in a file size of approximately 1.5MB, since each point is saved. Can I somehow "reduce" this image by combining overlapping points?

I tried several options such as setting dpi=300 and transparence=False, but since PDF stores the figure as a vectorized image, this naturally didn't change anything.

Things that might work, but have drawbacks:

  • hexbin plots: Works for a single scatter plot if the resolution and cmap is adjusted correctly, but I want to plot a scatter-matrix with (n x n) scatter plots. There is afaik no hexbin-matrix plot.
  • saving to a rasterized format: The plots are for a journal which requests vectorized plots whereever possible. Thus I'd like to avoid storing the image as a rasterized image.
  • randomly extracting parts of the data: might work, but will alter the appearance of the plots.

Any ideas?
Thanks in advance!

2 Answers 2

2

Maybe you want to change your approach and use something different from a scatter plot, leaving to Numpy and Matplotlib the task of lowsampling your data set — in other words, use Numpy's histogram2d and Matplotlib's imshow

x, y = [p.random.normal(size=100000) for _ in (4, 34)] h, xedge, yedge = np.histogram2d(x, y, bins=25) cmap = plt.get_cmap('Greys') plt.imshow(h, interpolation='lanczos', origin='low', cmap=cmap, extent=[xedge[0], xedge[-1], yedge[0], yedge[-1]]) 

enter image description here

plt.savefig('Figure1.pdf') # → 30384 bytes 

Grid arrangement (this time using hexbin)

np.random.seed(20190308) fig, axes = plt.subplots(3, 2, figsize=(4,6), subplot_kw={'xticks': [], 'yticks': []}) fig.subplots_adjust(hspace=0.05, wspace=0.05) for ax in axes.flat: ax.hexbin(*(np.random.normal(size=10000) for _ in ('x', 'y')), cmap=cmap) 

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

This is a really nice approach, thanks! But unluckily it has the same flaws as the hexbin-plots: No matrix (n x n) plotting is supported. I am currently in the last steps of making a small tool to make a hexbin-matrix plot and will post it as soon as it is done.
Do you want to make a grid of {hexbins, imshows}? You can create a grid of axes and then use the hexbin method of each one of axes, the difficult part is to make the grid to your exact specs.
Yep, should have made that more clear in my question. I only stated it in the methods having drawbacks. Sorry for that!
I have posted hexbins in a grid, it's rough and you possibly want to do different tweaks, e.g., same x limits and y limits, axis on the external borders, legends etc.
Thanks alot. As alread said, I am almost finished with my own implementation of hexbin-matrix which is made to resemble pandas pd.plotting.scatter_matrix as much as possible. I'll post it, if someone is interested in it. I'll still accept your answer as appreciation for the effort you put into it. :) Thanks again!
0

This may be a cheat but you could save it as a .png file and then insert it into pdf canvas via latex and fit the document margins to the figure.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.