Random words generate using python

Question

I have a list of words

count=100 list = ['apple','orange','mango']

for the count above using random function is it possible to select 40% of the time apple, 30% of the time orange and 30% of the time mango?

for ex:

for the count=100, 40 times apple, 30 times orange and 30 times mango.

this select has to happen randomly

Community · Accepted Answer · 2017-05-23 12:00:59Z

Based on an answer to the question about generating discrete random variables with specified weights, you can use numpy.random.choice to get 20 times faster code than with random.choice:

from numpy.random import choice sample = choice(['apple','orange','mango'], p=[0.4, 0.3, 0.3], size=1000000) from collections import Counter print(Counter(sample))

Outputs:

Counter({'apple': 399778, 'orange': 300317, 'mango': 299905})

Not to mention that it is actually easier than "to build a list in the required proportions and then shuffle it".

Also, shuffle would always produce exactly 40% apples, 30% orange and 30% mango, which is not the same as saying "produce a sample of million fruits according to a discrete probability distribution". The latter is what both choice solutions do (and the bisect too). As can be seen above, there is about 40% apples, etc., when using numpy.

Raymond Hettinger · Accepted Answer · 2016-05-28 08:24:32Z

The easiest way is to build a list in the required proportions and then shuffle it.

>>> import random >>> result = ['apple'] * 40 + ['orange'] * 30 + ['mango'] * 30 >>> random.shuffle(result)

Edit for the new requirement that the count is really 1,000,000:

>>> count = 1000000 >>> pool = ['apple'] * 4 + ['orange'] * 3 + ['mango'] * 3 >>> for i in xrange(count): print random.choice(pool)

A slower but more general alternative approach is to bisect a cumulative probability distribution:

>>> import bisect >>> choices = ['apple', 'orange', 'mango'] >>> cum_prob_dist = [0.4, 0.7] >>> for i in xrange(count): print choices[bisect.bisect(cum_prob_dist, random.random())]

But if the count=1000000 then the list size will increase right, actually I'm trying to simulate a data set like 1000000 rows per day over a period of one month, would it be good if i use the same logic?
The concept is perfectly general and there are many ways to build on it. I editted the answer to show how to use random.choice() to make one selection at a time from of pool where the elements are in proper proportion. You could also make a cumulative distribution and use bisect for the selection but that would have been overkill for the way you described your problem.

Collectives™ on Stack Overflow

Random words generate using python

2 Answers 2

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Linked

Related