2
$\begingroup$

I'm fairly new to data science and ML. I have data of an item going through a release process. I have collected data on various variables such as "product category", "product line", "design country", "hour of day started" and I also have data on "total time" which is the time it took the item going through the entire process. In total I have 18 different input variables where each variable is either a categorical or a discrete number such as "hour of day started".

Design_cntry Prod_category prod_line ... time_minutes A A1 A11 ... 43.2 B B1 A11 ... 20.1 C E1 B11 ... 15.0 ... ... ... ... .... 

I want to build a statistical regression analysis model in python which outputs the probability of a statement. Say for instance P(time > 1000 min | product category = A, product line = B, ... ) and am wondering how to tackle this problem? Are there any general ways of doing this? Or good articles/literature on this topic anyone could recommend?

I only have non negative data, so maybe there are any good regression forms based on exponential distributions?

$\endgroup$

1 Answer 1

1
$\begingroup$

First, you have to pre-process your data. it includes encoding your categorical variables. You can either do it using pandas.get_dummies, or sklearn.preprocessing.OneHotEncoder in your pipeline. Based on the algorithm you want to use, you usually have to standardize your numerical variables. this can be done using any of sklearn.preprocessing methods such as StandardScaler.

If you want to do logistic regression, you have to categorize your output variable - time - to classes like ['less than 1000 min', 'more than 1000 min']. Otherwise, if you want to keep time as it is, you are doing a multiple regression.

$\endgroup$
1
  • $\begingroup$ Thank you for the input. I believe that as of the time being I will look into the multiple regression. Do you have any litterature, articles or examples you would recommend for this task? $\endgroup$ Commented Feb 7, 2019 at 18:17

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.