Clustering in a 3D star-field using Count-in-Cells in Python

Question

First time poster, couldn't find anything that totally solved my issue.

I'm working on a simulation for galactic colonisation for my masters project. A thing I'm trying to do is look a the voids of uncolonised stars left over after the simulation ends and to see if there is clustering behaviour past statistical fluctuations. As it is a monte-carlo numerical problem, a correlation function is no really appropriate so I am using the counts-in-cells method usually employed to look at galaxy clusters.

So I am working in cartesians

data = np.genfromtxt('counts.csv') # positions of uncolonsed stars x = data[:,0] y = data[:,1] z = data[:,2]

What I want to do is use boxes of varying sizes to count the number of stars inside the box and compare to what the mean should be and do statistics on the results.

The direction I'm going in is looking at some sort of 3D histogram such as the bubble plot seen here. I tried this out and it doesn't seem to be binning all my data and I'm not sure why, i.e, the 'floor' of the cube has 'bubbles' but much of the 'roof' has nothing:

3D Bubble Histogram

This is clearly wrong when you look at the plotted raw star field:

Plotted star field

It looks like the bins at higher z values aren't holding any data. This is probably a pretty straightforward problemm but I am in need of some fresh eyes and minds that are better at python than I am.

Can anyone think as to how this is can be fixed? Also I'd like to find a way to count the number of points per box, i.e per bin.

I'm sorry if I'm being a bit dim but I appreciate any help any of you fine fellows can offer me.

Thanks chums!

The code in the answer you linked looks overly complicated to me, maybe I just don't understand what they are trying to do. To compute histograms on any dimension just use numpy.histogramdd. — Stop harming Monica
– Stop harming Monica, Commented Dec 9, 2016 at 13:49
You can run a KDE on your sample and visualize it through volume rendering. See, for example, here. — Vlas Sokolov
– Vlas Sokolov, Commented Dec 9, 2016 at 14:45
The KDE treatment looks really good, thank you Vlas. Only thing left for me to do is count number of points per bin. Confused as to why scipy.stats.binned_statistic_dd is coming up with errors like: TypeError: binned_statistic_dd() takes at least 2 arguments (5 given) — Skippeh
– Skippeh, Commented Dec 9, 2016 at 22:32
@Skippeh Difficult to tell what's wrong with code we can't see. As I already said you can count points per bin using numpy.histogramdd. — Stop harming Monica
– Stop harming Monica, Commented Dec 10, 2016 at 22:27

armatita · Accepted Answer · 2016-12-11 21:28:41Z

In the comments you had some alternatives to solve your problem and its difficult to say what is wrong with your code without seeing it. In any case this kind of problem is typically solved by counting data inside a regular grid (which is, nevertheless, a general approach to do an histogram).

The advantage in building your own grid is that you know immediately where every "sector" is, where it starts and where it ends. As so I would suggest the following approach as an alternative if you want to try it.

import numpy as np from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt # Generating some random data. data = np.random.randint(0, 100, (1000,3)) x, y, z = data[:, 0], data[:, 1], data[:, 2] # Generating raw view fig = plt.figure() ax = fig.add_subplot(111, projection='3d') ax.scatter(x, y, z, marker='+', s=25, c='r') plt.show() # Generating some grid with origin, cell size, and number of cells 10 10 10 numx, numy, numz = 5, 5, 5 origx, origy, origz = 0, 0, 0 sizex, sizey, sizez = 20, 20, 20 grid = np.vstack(np.meshgrid(range(numx), range(numy), range(numz))).reshape(3, -1).T gx, gy, gz = grid[:, 0]*sizex + origx, grid[:, 1]*sizey + origy, grid[:, 2]*sizez + origz # Calculating the number of stars in each cell: ix = ((x - origx)/sizex).astype(int) iy = ((y - origy)/sizey).astype(int) iz = ((z - origz)/sizez).astype(int) s = np.zeros((numx, numy, numz)) for i in range(ix.shape[0]): s[ix[i], iy[i], iz[i]] = s[ix[i], iy[i], iz[i]] + 1 s = s.flatten() mask = s > 0 # Plotting the result fig = plt.figure() ax = fig.add_subplot(111, projection='3d') ax.scatter(gx[mask], gy[mask], gz[mask], marker='o', s=s[mask]*100, c='b', edgecolor ="r") plt.show()

The result for randomized data is this:

Collectives™ on Stack Overflow

Clustering in a 3D star-field using Count-in-Cells in Python

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related