Creating pandas dataframe in normal distribution

Question

A want to create a sample dataframe -- based on a json template -- that looks as real as possible. Hence normal distribution.

This is what I have tried

import json, random import pandas as pd sample_data = """{"product1":[ {"category":"Fruits", "productlist":["Bell Peppers","Red Chillies", "Onions", "Tomatoes"]} ], "product2":[ {"category":"Vegetables", "productlist":["Apple","Mango","Banana"]} ]}""" products = json.loads(sample_data) colHeaders = [] for k,v in products.items(): colHeaders.append(v[0]['category']) df = pd.DataFrame(columns= colHeaders) for i in range (1000): itemlist = [] for k,v in products.items(): itemlist.append(random.choice(v[0]['productlist'])) #print(itemlist) df.loc[len(df)] = itemlist print(df)

I am not sure I am doing it correctly. If not, please help me with

How to check if the data frame rows represent a normal distribution?
How to try other distributions in this case?

Other related Stack Overflow questions I have referred are:

Lara Ipek · Accepted Answer · 2021-01-11 18:15:32Z

0

I think what you should do is generate integers in normal distribution and make them the indices of the list. Also graphing the numbers you generated is in my opinion the best way to check whether they are a normal distribution, it should resemble the normal distribution bell shape. However since 20 is such a small number, it may not exactly be the desired shape which is something to keep in mind. The following link I think has all the information you need.

How to generate a random normal distribution of integers

answered Jan 11, 2021 at 18:15

Lara Ipek

138 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

kingmakerking Over a year ago

productlist = ["Apple","Mango","Banana"] can be treated as productlist = [0,1,2] but wouldn't the rest of the logic still remain the same?`

Lara Ipek Over a year ago

not sure what you mean here but i think your way works just fine, im just not sure if it would generate random distribution. if you do decide to use a different random generator, your code would change as this: for k,v in products.items(): itemlist.append(v[0]['productlist'][random_integer]) the only problem here is that it generates random integers for the two lists seperately, meaning you have two rounded up distributions for ranges (0,3) and (0,4) if that is indeed what you wanted

kingmakerking Over a year ago

From what you are suggesting, the random_integer will be in normal distribution but not the values in product list. The idea is the product list appended to the dataframe (as rows) to look like real occurrence.

Lara Ipek Over a year ago

well since your product list has items in it, there is no real way to have a normal distribution between them. The thing I'm describing is only useful if the lists are ordered. From what I can tell, any random function will do what you want, especially since the example lists are so small anyway.

Collectives™ on Stack Overflow

Creating pandas dataframe in normal distribution

1 Answer 1

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Linked

Related