1
$\begingroup$

I have some training data (TRAIN) and some test data (TEST). Each row of each table contains an observed class (X) and some columns of binary (Y). I'm using a Python script that is intended to predict the probability (Pr) of X given Y in the test data based on the training data. It uses a Bernoulli naive Bayes classifier. Here is my script:

https://stackoverflow.com/questions/55187516/look-up-bernoullinb-probability-in-dataframe

It works on the dummy data that is included with the script.

On the real data, I know from experience which class some of the Y columns are indicative of. My script however is giving probability predictions like "1" where I don't think that the class is correct and "6e-77" on correct classes.

Any advice on what I can try please?

Edit

There are two problems. The very low probability is caused by the naive assumption that nothing is related to anything else. This is described here: https://scikit-learn.org/stable/auto_examples/calibration/plot_calibration_curve.html#sphx-glr-auto-examples-calibration-plot-calibration-curve-py

The incorrect answers are caused by my code getting confused about which class is which, as described on my Stack Overflow post.

$\endgroup$

1 Answer 1

1
$\begingroup$

Each column of binary (Y) is a feature. The Bernoulli naive Bayes classifier could identify the class (X) where the number of features (Y) was less than 17. The real data had more features than that. I found that another method could classify it accurately. That was:

Trainining:

(1) Count which features (Y) are in each class (X) in the training data

Testing:

(2) Give each row a score (Z) with a starting value of 0.5

(3) For each row:

  • If each feature (Y) is in the class (X) in the training data then add 1 to the score (Z).

  • If each feature (Y) is not in the class (X) in the training data then subtract 1 from the score (Z).

  • If the class (X) is not in the training data then don't do anything

The score (Z) was a good classifier for my data.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.