A want to create a sample dataframe -- based on a json template -- that looks as real as possible. Hence normal distribution.
This is what I have tried
import json, random import pandas as pd sample_data = """{"product1":[ {"category":"Fruits", "productlist":["Bell Peppers","Red Chillies", "Onions", "Tomatoes"]} ], "product2":[ {"category":"Vegetables", "productlist":["Apple","Mango","Banana"]} ]}""" products = json.loads(sample_data) colHeaders = [] for k,v in products.items(): colHeaders.append(v[0]['category']) df = pd.DataFrame(columns= colHeaders) for i in range (1000): itemlist = [] for k,v in products.items(): itemlist.append(random.choice(v[0]['productlist'])) #print(itemlist) df.loc[len(df)] = itemlist print(df) I am not sure I am doing it correctly. If not, please help me with
- How to check if the data frame rows represent a normal distribution?
- How to try other distributions in this case?
Other related Stack Overflow questions I have referred are: