plotting data with thousands of points and its label(group) with matplotlib

Question

I have an array of 500000 samples i.e., the data's shape is (500000, 3) where the first two columns represent x-coordinate and y- coordinate, and the third column is Label values to which the datapoint @ (X,Y) belongs.

for example:- data= [ [20,10, 12.3320], [22, 13, 230.221],.....[..] ]

I tried the below method. But this is too time consuming and poorly interpreted.

import matplotlib.pyplot as plt colors = 10*['r.','g.','b.','c.','k.','y.','m.'] for i in range(len(labels)): plt.scatter(data[i][0], data[i][1], colors[labels[i]],marker='.') plt.show()

Is there any other method like imshow() or other which is suitable for the above code which leads to good interpretation?

In order to use imshow the data must be equally spaced on a grid. Is this the case? Can you tell us more how your data is structured in the columns? — ImportanceOfBeingErnest
– ImportanceOfBeingErnest, Commented Feb 1, 2017 at 23:08
The data structure is like this array([[ 0.19975574, 0.10402092, 0.00029645], [ 0.19975574, 0.10727158, 0.00029645], [ 0.19975574, 0.11052223, 0.00029645], [ 0.19975574, 0.11377289, 0.00029645], [ 0.19975574, 0.11702354, 0.00029645], [ 0.19975574, 0.12027419, 0.00029645], [ 0.19975574, 0.12352485, 0.00029645], [ 0.19975574, 0.1267755 , 0.00029645], [ 0.19975574, 0.13002616, 0.00029645], [ 0.19975574, 0.13327681, 0.00029645],...........]) — raju bhai
– raju bhai, Commented Feb 2, 2017 at 10:04
the data is scaled to have unit variance in each axis.. So the data looks above. — raju bhai
– raju bhai, Commented Feb 2, 2017 at 10:06
don't put your data into the comments. Also you can answer questions from the comment section simply by editing your question. Showing the original data makes things a bit complicated. To see the structure, use some other data, in the sense of a minimal reproducible example. — ImportanceOfBeingErnest
– ImportanceOfBeingErnest, Commented Feb 2, 2017 at 10:11

Alexandre Kempf · Accepted Answer · 2017-02-07 15:33:32Z

The scatter function in matlplotlib is quiet slow, I would recommend to use vispy that use the GPU to plot a large number of points :

Works with vispy 0.4.0 that you can install with pip or conda :

pip install vispy

Here is the code (plotted in less than 2sec on my computer):

import numpy as np from vispy import scene, visuals, app import matplotlib.pyplot as plt data = np.random.random((500000,3)) canvas = scene.SceneCanvas(keys='interactive', show=True) view = canvas.central_widget.add_view() # Create the scatter plot scatter = scene.visuals.Markers() scatter.set_data(data[:,:2], face_color=plt.cm.jet(data[:,2])) view.add(scatter) view.camera = scene.PanZoomCamera(aspect=1) view.camera.set_range() app.run()

there is a nice documentation for vispy and you can customize your plot in the set_data function with arguments like face_color, edge_color, size, edge_width, symbol ...

Good luck with your data visualization ;)

Note if you get a black screen with no markers, there's an issue in vispy: github.com/vispy/vispy/issues/1085
Hi there, I run this code on ipython notebook and I received nothing. Do you know what is the issue?
Try to launch it as a python script and not as a ipython notebook :)

Collectives™ on Stack Overflow

plotting data with thousands of points and its label(group) with matplotlib

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related