How to generate non-overlapping random points uniformly and evenly within N-dimensional spaces or dataset between low and high range

Question

I have tried to find random points on the NxM dataset based on the lowest value of each M as low range and the highest value of each M on as high range.

Here is the code:

def generate_random_points(dataset, dimension_based=False): dimension = dataset.shape[1] if dimension_based == False: row_size = np.floor((np.sqrt(dimension))).astype(int) if np.floor(np.sqrt(dimension)).astype(int) < np.floor(np.sqrt(dataset.shape[0])).astype(int) else np.floor((np.sqrt(dataset.shape[0]))).astype(int) generated_spikes = np.random.uniform(low=np.min(dataset, axis=0), high=np.max(dataset, axis=0), size=(row_size, dimension)) return generated_spikes else: row_size = np.floor((np.sqrt(dimension))).astype(int) generated_spikes = np.random.uniform(low=np.min(dataset, axis=0), high=np.max(dataset, axis=0), size=(row_size, dimension)) return generated_spikes

But the problem is most of the random points lies on the boundaries or edges of dataset spaces rather than being uniformly and evenly distributed

Here is a plot of one example: random points are black ones

I have also tried doing PCA and then apply the high and low range by doing inverse_transform to the ranges but kind of expectedly, the random points are not distributed uniformly and evenly

def generate_random_points(dataset,dimension_based= False): dimension = dataset.shape[1] dimension_pca = dataset.shape[0] if dataset.shape[0] < dataset.shape[1] else dataset.shape[1] pca, dataset_pca = perform_PCA(dimension_pca, dataset) low_pca = np.min(dataset_pca, axis=0) high_pca = np.max(dataset_pca, axis=0) low = perform_PCA_inverse(pca, low_pca) high = perform_PCA_inverse(pca, high_pca) if dimension_based == False: row_size = np.floor((np.sqrt(dimension))).astype(int) if np.floor(np.sqrt(dimension)).astype(int) < np.floor(np.sqrt(dataset.shape[0])).astype(int) else np.floor((np.sqrt(dataset.shape[0]))).astype(int) generated_spikes = np.random.uniform(low=low, high=high, size=(row_size, dimension)) return generated_spikes else: row_size = np.floor((np.sqrt(dimension))).astype(int) generated_spikes = np.random.uniform(low=np.min(dataset, axis=0), high=np.max(dataset, axis=0), size=(row_size, dimension)) return generated_spikes

How to solve the issue such that the random generated points are more evenly distributed instead of piling up on two edges and also do not overlap?

I need like this:

the red one is the position required for the black points which are crossed

P.S:

Both of the image is a PCA representation of a dataset with shape of (46,2730) i.e. 46 rows and 2730 dimensions
I was thinking of using the 2nd answer of this question : algorithm for generating uniformly distributed random points on the N-sphere But I am not sure how to calculate the radius(R) of an N-dimensional dataset or even if it make sense so that I can use that 2nd answer on the link above.

Please help!

Hi, in what area do you want your points to be uniformly distributed? In the square [-50, 85] x [-50, 85]? — Lukas S
– Lukas S, Commented Oct 9, 2021 at 14:44
@user2640045, I want my points to be evenly distributed as if they inside the graph(maybe thats what you mean by square?) and not all of the random points just pile up on the edge. — Shihab Ullah
– Shihab Ullah, Commented Oct 9, 2021 at 15:00
The points below are uniformly distributed in the square [-50, 85] x [-50, 85]. Though maybe you meant distributed like the points you have in your picture. In that case you would have to give me coordinates of the points. — Lukas S
– Lukas S, Commented Oct 9, 2021 at 15:29
@user2640045, you can use make_blob function of sklearn to get a random dataset and try to apply there — Shihab Ullah
– Shihab Ullah, Commented Oct 9, 2021 at 16:56
Well I think I will leave answering this to somebody else, thank you. — Lukas S
– Lukas S, Commented Oct 9, 2021 at 16:59

DanielTuzes · Accepted Answer · 2021-10-18 23:20:30Z

To better understand the question and give some hints on possible causes of your problem, I post this message which cannot fit into a comment.

Description

Let me use my own words to explain your problem and please correct me or your answer to make your case more clear.

You are given N_1 and N_2 number of points in an M dimensional space. Maybe your points in each set are normally distributed in the M dimensional space, e.g. if you create it with make_blobs. Then you identify the minimum values x_{i,min,1} and maximum values x_{i,max,1} for each dimension x_i for each point in the set N_1. Then you generate random points in the M dimensional space within the M-dimensional rectangle restricted in the range

[x_{1,min,1},x_{1,max,1}] x [x_{2,min,1},x_{2,max,1}] x ... x [x_{M,min,1},x_{M,max,1}]

Then you apply PCA and plot the 2 principal components. Your observation is that your random points are not uniformly distributed within the range where your data lies.

Explanation and example in 2D

If your data follows an M-dimensional normal distribution (in this example, M=2), the minimum and maximum values can lie a couple of times further than the standard deviation. When you generate random points within the minimum and maximum values, your random points will evenly represent the ranges where you barely have data points. Take the following as an example. It generates 10'000 data points with a normal distribution in 2D, and then generates 5 further points with uniform distribution in the rectangle drawn around the data points.

import matplotlib.pyplot as plt import numpy as np np.random.seed(3) x_data = np.random.normal(size=10000) x_min = x_data.min() x_max = x_data.max() y_data = np.random.normal(size=10000) y_min = y_data.min() y_max = y_data.max() random_x = np.random.uniform(x_min, x_max, size=5) random_y = np.random.uniform(y_min, y_max, size=5) fig, ax = plt.subplots() ax.plot(x_data[:10000], y_data[:10000], "o", label="data points with normal distribution") ax.plot(random_x, random_y, "o", label="random points with uniform distribution") ax.legend() plt.show()

The output of the code is shown below:

Although the random points are uniformly distributed, one may think they are only at the edges of the distribution. From some point of view, the situation in higher dimensions just gets worse. Imagine the unit M-dimensional sphere and cube. The ratio of the volume of the sphere and the volume of the cube tends to 0, meaning that if you generate random points in the unit cube, whereas your data is (mainly) located within the unit sphere, then the ratio of your random points outside the area of your data points tends to 1. However, if you simply drop the extra dimensions with PCA, you cannot see this completely in the 2D plot.

Suggestion

If I understood your problem correctly and the problem is just an illusion, please rephrase your question accordingly so that others can address your specific request.

If you want your random points to better reflect the distribution properties of your data, you need to set up a model on your data, e.g. it is normally distributed data. Identify the mean and std, and generate random points using a distribution with that properties.

Further questions

Could you please show more data points?
Is it relevant that you have 2 datasets?
I didn't understand the figure here:

the red one is the position required for the black points which are crossed" Could you please replot your figure, provide more examples and rephrase the legend?

Collectives™ on Stack Overflow

How to generate non-overlapping random points uniformly and evenly within N-dimensional spaces or dataset between low and high range

1 Answer 1

Description

Explanation and example in 2D

Suggestion

Further questions

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Description

Explanation and example in 2D

Suggestion

Further questions

Comments

Linked

Related