3
$\begingroup$

There are classes A,B,C,D,E. Variable x has different means for each of these classes, but there is overlap in the range of x among classes. Item counts are different between classes, eg. there are many more A's than B's etc. Given an item with a known value of x but unknown class, how do I calculate the probability of it falling into class A vs. class B vs. class C etc.

eg. If x is close to the mean value for class C, there may be 60% probablity of this item falling into class C, 20% for class B, 10% for class D, 7% for class A and 3% for class E.

$\endgroup$
2
  • $\begingroup$ Do you know anything more about the distribution of x in each class, other than just the mean? If you can represent x's distribution in each class by a normal distribution with known mean and variance, for example, the problem becomes much simpler. $\endgroup$ Commented Jul 31, 2018 at 18:49
  • $\begingroup$ @Nuclear Wang. Yes, x's distribution within each class is known. However, it is not a normal distribution. This phenomena tends to be very right skewed. Becomes more normal when log transformed. $\endgroup$ Commented Jul 31, 2018 at 18:53

1 Answer 1

1
$\begingroup$

Your problem is one of probabilistic multiclass classification. A classical statistical approach is multinomial logistic regression. There are also many machine learning approaches, like CARTs or Random Forests.

(Multinomial) logistic regression automatically outputs conditional probabilities. For tree-based methods, you may need to specifically set a parameter. For instance, if you use the randomForest package in R, you need to apply predict.randomForest(...,type=prob). Or use a dedicated implementation.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.