I'm fairly new to data science and ML. I have data of an item going through a release process. I have collected data on various variables such as "product category", "product line", "design country", "hour of day started" and I also have data on "total time" which is the time it took the item going through the entire process. In total I have 18 different input variables where each variable is either a categorical or a discrete number such as "hour of day started".
Design_cntry Prod_category prod_line ... time_minutes A A1 A11 ... 43.2 B B1 A11 ... 20.1 C E1 B11 ... 15.0 ... ... ... ... .... I want to build a statistical regression analysis model in python which outputs the probability of a statement. Say for instance P(time > 1000 min | product category = A, product line = B, ... ) and am wondering how to tackle this problem? Are there any general ways of doing this? Or good articles/literature on this topic anyone could recommend?
I only have non negative data, so maybe there are any good regression forms based on exponential distributions?