1
$\begingroup$

Let's say I have data with two target labels, A and B.

I want to design a random forest that has three outputs: A, B and Not sure.

Items in the Not sure category would be a mix of A and B that would be about evenly distributed.

I don't mind writing the RF from scratch.

Two questions:

  • What should my split criterion be?
  • Can this problem be reposed in a standard RF framework?
$\endgroup$

2 Answers 2

0
$\begingroup$

A standard decision tree (or random forest) predicts a probability for the instance to belong to the positive class (I'm assuming binary classification). This probability is based exactly on the same idea: given the features values leading to this leaf of the node, if the proportion of positive instances in this leaf (i.e. with these conditions on the features) is $p$ then a new instance is assigned $p$ as a probability to be positive.

So basically you just have to obtain the predicted probability (instead of the class), and if this probability is close enough to 0.5 (e.g. between 0.4 and 0.6) you can predict 'not sure'.

Naturally this probability is based on the training data. If the training data is not representative enough or a test instance is too different from the training data, then the probability would be meaningless.

$\endgroup$
2
$\begingroup$

A Bayesian approach would be able to model "not sure" or decision uncertainty. The Bayesian approach to Random Forest is often called Bayesian forests. The goal is to generate a posterior distribution of trees, thus there is no splitting on "not sure". Move the "not sure" aspect to the decision after the forest has been estimated.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.