2
$\begingroup$

I need to predict (estimate) probabilities of (rare) events when the training data only contains the yes/no indicator.

I.e., my target (dependent) variable is binary (logical).

What I need is not just to predict yes/no, but estimate the probabilities of yes/no for each observation.

If I use logistic regression, then the model output is, indeed, an estimate of the probability. What if I am using a different model, e.g., vw? (because, e.g., it is faster and outperforms logistic regression as a binary classifier).

So, I have a model which produces a score for each observation and I want to convert the score to probability.

It is natural to use total variation distance to evaluate the probability prediction, which motivated my previous question. The accepted answer there suggests Liblinear with L1 loss, but that produces a binary classifier, not a probability estimator.

So, how do I calibrate model scores so that they actually estimate the event probabilities?

I now train a single-independent-variable logistic regression to map the scores to probabilities. Can I do better?

$\endgroup$
3
  • $\begingroup$ If you check among my questions you can find how to extract a probability from a decision tree classifier. It is reported in the book the elements of statistical learning. Sorry but I'm on a tablet can't give you the link $\endgroup$ Commented May 23, 2014 at 21:36
  • $\begingroup$ I think you're referring to stats.stackexchange.com/questions/93202/… $\endgroup$ Commented May 24, 2014 at 6:57
  • $\begingroup$ yes is this one. Let me know if this can help $\endgroup$ Commented May 24, 2014 at 10:00

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.