1

I want to simulate a variable which can have values between 0 and 1. But I also want that this random variable to have 80% zeroes. Currently I ma doing the following:

data['response']=np.random.uniform(0,1,15000)#simulate response data['response']=data['response'].apply(lambda x:0 if x<0.85 else x) 

But this results in only the extreme values(0 and .8+) in the variable. I want to have 80 percent zeroes and rest 20% rows to have values between zero and one. This has to be done randomly.

1
  • 1
    You can generate the 80 percent of zeros,then append a quarter of the length of the zeros of values between zero and one, and then shuffle it. Commented Apr 2, 2017 at 11:32

4 Answers 4

2

Here's another one using numpy.random.shuffle

# Proportion between zeros and non-zeros proportion = .8 n_non_zeros = 200 # Generate fake non-zero data. # Inversion to ensure the range contains all the values between 0 and 1, except 0 non_zeros = 1 - np.random.uniform(size=[n_non_zeros]) # Append [proportion / (1 - proportion)] zeros # to 'non_zeros' array for each non-zero non_zeros += [0] * int(n_non_zeros * proportion / (1 - proportion)) # Shuffle data np.random.shuffle(data) # 'data' now contains 200 non-zeros and 800 zeros # They are %20 and %80 of 1000 
Sign up to request clarification or add additional context in comments.

Comments

1

Building up on your code, you can just scale x when it is larger than 0.8:

lambda x: 0 if x < 0.8 else 5 * (x - 0.8)

Comments

1

Here's one approach with np.random.choice, which would suit here with its optional input argument replace set as False or 0 to generate unique indices along the entire length of 15000 and then generate those random numbers with np.random.uniform and assign.

Thus, the implementation would look something along these lines -

# Parameters s = 15000 # Length of array zeros_ratio = 0.8 # Ratio of zeros expected in the array out = np.zeros(s) # Initialize output array nonzeros_count = int(np.rint(s*(1-zeros_ratio))) # Count of nonzeros in array # Generate unique indices where nonzeros are to be placed idx = np.random.choice(s, nonzeros_count, replace=0) # Generate nonzeros between 0 and 1 nonzeros_num = np.random.uniform(0,1,nonzeros_count) # Finally asssign into those unique positions out[idx] = nonzeros_num 

Sample run results -

In [233]: np.isclose(out, 0).sum() Out[233]: 12000 In [234]: (~np.isclose(out, 0)).sum() Out[234]: 3000 

Comments

1

We could draw numbers from a uniform distribution extended to the negative side, then take max with zero:

>>> numpy.maximum(0, numpy.random.uniform(-4, 1, 15000)) array([ 0.57310319, 0. , 0.02696571, ..., 0. , 0. , 0. ]) >>> a = _ >>> sum(a <= 0) 12095 >>> sum(a > 0) 2905 >>> 12095 / 15000 0.8063333333333333 

Here -4 is used because 4 / (4 + 1) = 80%.


Since the result is a sparse array, perhaps a SciPy sparse matrix is more appropriate.

>>> a = scipy.sparse.rand(1, 15000, 0.2) >>> a.toarray() array([[ 0. , 0.03971366, 0. , ..., 0. , 0. , 0.9252341 ]]) 

Here 0.2 = 1 − 0.8 is the density of the array. The nonzero numbers are distributed uniformly between 0 and 1.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.